Why is the DDPG episode rewards never change during the whole training process?

Question

Guoge Tan 2020-5-25

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/532933-why-is-the-ddpg-episode-rewards-never-change-during-the-whole-training-process

评论： Shahriar 2022-6-29

I'm training a DDPG agent using the Reinforcement Learning toolbox on MATLAB R2020a for a path planning problem. But as you can see, the DDPG episode rewards and average rewards never change during 5000 episodes. I used a simple neural networks with 20 neurons and three layers, the learning rate is set to 0.01, and the Gradient Threshold is 1. Then I try to set weights and bias for fully connected layers and change my reward function, but the result is the same.

I also saw at here that others have a similar problem. So any advice for my problem? Thank you.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Shahriar 2022-6-29

@Guoge Tan could you solve this issue? I have a similar situation.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Emmanouil Tzorakoleftherakis 2020-5-26

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/532933-why-is-the-ddpg-episode-rewards-never-change-during-the-whole-training-process#answer_439593

Looks like the scale between Q0 and episode reward is very different. Try unchecking "Show Episode Q0" to see of the episode reward changes. I would then simplify the critic network to make sure it outputs values in a similar scale as the episode reward.