DDPG Agent: Not stabilizing creating an unstable model

Question

Rajesh Siraskar 2019-12-16

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/496825-ddpg-agent-not-stabilizing-creating-an-unstable-model

评论： Emmanouil Tzorakoleftherakis 2020-1-27

V.9.94.4_MATLAB_16-Dec-2019.jpg

Dear MATLAB,

Am training a DDPG agent on randomly set straight lines (levels) and later testing on a benchmark waveform. Shouldn't the training stablize over time and create a stable model? At 960 episodes the saved agent seems to perform better than at 2180 episodes. Both agents saved for avg.rewards over 50 episodes and > 25 K. Also the difference between model saved at 940 versus 960 episodes seems drastic.

In the picture below are the Episode Manager showing the avg.rewards (over 50 episodes) going up and down several times. One would expect it to look like the dark green line, stablizing over time? What change can I make to create a stable model?

Action space: 1.0 to 10.0, continuos

Test wave-form: 2000 seconds long

Training sample time and simulation length: Ts: 1 and Tf=250

Hyper-parameters: Learning Rates Critic = 1e-03, Actor = 1e-04 | Gamma (discount) = 0.95, Batch size = 64

Neurons: Obsv. path: FC1 = 64, FC2 = 24 and Actor path FC1 = 24

DDPG Noise Variance = 0.1, VarianceDecayRate = 1e-5 (Have tried Noise Variance 0.45 too and decay at 1e-3, 1e-4 etc.)

(For a higher res. image please see attached)

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Rajesh Siraskar 2019-12-20

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/496825-ddpg-agent-not-stabilizing-creating-an-unstable-model#answer_407189

Based on several rounds of training, my personal observation is that RL will converge initially to an optimal expected value.

Any training beyond that simply seems to not help. I think it is important to stop when we realize that it has reached the optimum.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Emmanouil Tzorakoleftherakis 2020-1-27

+1 on that. It could for example be the case that you reach a point in training where you have a decent policy, but exploration of the agent leads the search somewhere else (pros and cons of sample-based gradients).

请先登录，再进行评论。

DDPG Agent: Not stabilizing creating an unstable model

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

DDPG Agent: Not stabilizing creating an unstable model

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论