DDPG training curve remains always flat

Question

LUCA MARSEGLIA 2022-1-31

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1640185-ddpg-training-curve-remains-always-flat

编辑： Milan Bansal 2024-1-25

The training curve of my agent has always a shape that looks like this:

I tried to change as many parameters as I could but nothing changes, it always appears flat as in the image with the Episode Q0 curve that tends to the average reward. I changed the variance so that

as well as lowered the variance decay rate.

I also set a higher learnrate and varied the parameters of the reward function, but nothing ever changes.

Which could be the possible reasons that cause a learning curve to always have this flat shape?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Milan Bansal 2024-1-25

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1640185-ddpg-training-curve-remains-always-flat#answer_1397101

编辑：Milan Bansal 2024-1-25

Hi Luca,

I understand that you want to know the possible reasons for the flat DDPG training curve despite varying the parameters and hyperparameters.

Following are the possible reasons for the a flat training curve:

Reward Function: The reward function may not be providing meaningful gradients or sufficient learning signals to improve its policy.
Normalization issues: Lack of normalization for states, actions, or rewards could lead to training instability.
Gradient Problems:There could be vanishing or exploding gradients within the actor or critic networks.
Reward Scaling: The rewards may not be scaled properly, leading to insignificant updates to the policy.
Learning Rates: The learning rates for the actor and critic might be inappropriate, possibly too low or too high.
Target Network Update Rate: The target networks for the actor and critic may not be updating at a suitable rate.

Following are the possible ways to diagnose and resolve the issue:

Analyze Reward Function: Ensure the reward function provides a clear gradient for the agent to learn effectively.
Hyperparameter Optimization: Experiment with different hyperparameters, including learning rates and the discount factor.
Network Architecture Review: Check the actor and critic network architectures to ensure they are suitable for the complexity of the task.
Agent Options: Try varying the parameters in "rlDDPGAgent" and "rlDDPGAgentOptions" .
Target Update Method: Change the method of updating the target networks.

Please refer to the following documentation links to learn more about "DDPG Agents", "rlDDPGAgent" and "rlDDPGAgentOptions":

Additionally, you can find a related example in the documentation in the following link:

https://in.mathworks.com/help/reinforcement-learning/ug/train-agent-to-control-flying-robot.html

Hope this helps.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

DDPG training curve remains always flat

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

DDPG training curve remains always flat

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论