DDPG training curve remains always flat

5 次查看(过去 30 天)
The training curve of my agent has always a shape that looks like this:
I tried to change as many parameters as I could but nothing changes, it always appears flat as in the image with the Episode Q0 curve that tends to the average reward. I changed the variance so that
as well as lowered the variance decay rate.
I also set a higher learnrate and varied the parameters of the reward function, but nothing ever changes.
Which could be the possible reasons that cause a learning curve to always have this flat shape?

回答(1 个)

Milan Bansal
Milan Bansal 2024-1-25
编辑:Milan Bansal 2024-1-25
Hi Luca,
I understand that you want to know the possible reasons for the flat DDPG training curve despite varying the parameters and hyperparameters.
Following are the possible reasons for the a flat training curve:
  • Reward Function: The reward function may not be providing meaningful gradients or sufficient learning signals to improve its policy.
  • Normalization issues: Lack of normalization for states, actions, or rewards could lead to training instability.
  • Gradient Problems:There could be vanishing or exploding gradients within the actor or critic networks.
  • Reward Scaling: The rewards may not be scaled properly, leading to insignificant updates to the policy.
  • Learning Rates: The learning rates for the actor and critic might be inappropriate, possibly too low or too high.
  • Target Network Update Rate: The target networks for the actor and critic may not be updating at a suitable rate.
Following are the possible ways to diagnose and resolve the issue:
  • Analyze Reward Function: Ensure the reward function provides a clear gradient for the agent to learn effectively.
  • Hyperparameter Optimization: Experiment with different hyperparameters, including learning rates and the discount factor.
  • Network Architecture Review: Check the actor and critic network architectures to ensure they are suitable for the complexity of the task.
  • Agent Options: Try varying the parameters in "rlDDPGAgent" and "rlDDPGAgentOptions" .
  • Target Update Method: Change the method of updating the target networks.
Please refer to the following documentation links to learn more about "DDPG Agents", "rlDDPGAgent" and "rlDDPGAgentOptions":
Additionally, you can find a related example in the documentation in the following link:
Hope this helps.

产品


版本

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by