RL DDPG isn't learning

Question

Emmanuel Swetala 2021-4-24

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/811890-rl-ddpg-isn-t-learning

Hello,

I am trying to train a model to optimise the power generation units. I have six gen units each with different operation cost, on top of that I have a load profile which has to be met by sum of power contributed from each unit. Ofcourse the power deviation is allowed to be in a range of +/- 0.05.

Here is how I have modelled,

I use the RL agent's actions (six actions) as the power per unit, I multiply each one by the gain (base value) to get power in MW, then I use each MW unit as input to the respective operation cost function (quadratic in nature). I use the sum of power minus the load profile (it's magnitude in per unit and its integral), and the load profile as observations to an RL agent. The reward I define as:- r1 = negative square of deviation r2 = positive 2 if the deviation is within +/- 0.05 r3 = negative sum of operation costs Reward = r1+r2+r3

I used DDPG, with learning 0.0001 and 0.001 for actor and critic respectively, Sample time, 0.4 and simulation time 24, experience buffer 1e6 and mini batch size 256.

The training is run for 15000 episodes but doesn't converge to the expected range of power deviation of +/- 0.05 Also one thing observed is the plot of Q0 vs episode number is increasing indefinitely.

What might be a problem? Kindly consider am a beginner in this area, Thank you

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

RL DDPG isn't learning

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

RL DDPG isn't learning

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论