Reinforcement learning DDPG action fluctuations

Question

Tech Logg Ding 2020-11-17

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/650183-reinforcement-learning-ddpg-action-fluctuations

评论： Karim Darwich 2024-7-1

Upon attempting to train the path following control example in MATLAB, the training process generated the behviour shown in the picture.

The steering angle is constantly fluctuating.
The acceleration is also constantly flucutating.
The reward convergence is very noisy and seems to jump between a high reward and low reward.

The example from here shows that it should have converged already and the actions should be smooth.

What could be causing this issue? This also happened for other projects I used. One method I used was to penalise the fluctuation in the reward function using this term inspired by a paper published by Wang et. al:

10*[ (d/dt(current_action) * d/dt(previous_action) < 0]

Please let me know how to avoid this problem. Thank you very much!

2 个评论
显示无隐藏无

Emmanouil Tzorakoleftherakis 2020-11-17

Hello,

One clarification - the scope signals you are showing on the right, are you getting these during training or after training?

Tech Logg Ding 2020-11-17

Hi,

Thank you for the reply.

It was during training. However, upon completion, it still fluctuates with a smaller magnitude and frequency. I did not save the image so I can't post it here. The example in the link also shows fluctuations in the steering angle.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Emmanouil Tzorakoleftherakis 2020-11-22

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/650183-reinforcement-learning-ddpg-action-fluctuations#answer_552583

Hello,

During training, DDPG explores the action space by adding noise to the output of the actor (see step 1 here). That explains the variance during training.

Even after training you may see small variations in the actor output for observations that are different but close enough. After all you are effectively using a function approximator to approximate a nonlinear relationship between inputs (observations) and outputs (actions). If you want to get the policy to be more accurate near the setpoint, you could consider training further near the values of interest.

Also, the result you get on your machine may differ from the one posted in the documentation. Please see this post for an explanation.

Hope that helps

2 个评论
显示无隐藏无

sungho park 2022-2-23

for me after training, the actor output is always constant. can you explain why?

Karim Darwich 2024-7-1

@sungho park I have the same problem. Did you fix it ?

请先登录，再进行评论。

Reinforcement learning DDPG action fluctuations

2 个评论
显示无隐藏无

采纳的回答

2 个评论
显示无隐藏无

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Reinforcement learning DDPG action fluctuations

2 个评论 显示 无隐藏 无

采纳的回答

2 个评论 显示 无隐藏 无

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

2 个评论
显示无隐藏无

2 个评论
显示无隐藏无