Problems in using Reinforcement Learning Agent
3 次查看(过去 30 天)
显示 更早的评论
Hello, everyone,
I would like some support in using a DDPG agent with my electric drive model with brushless motor.
Just to give some context, the model consists of the classical equations in three-phase axes of the electromechanical model of a permanent magnet synchronous motor and some subsystems implementing Park and Blondel transformations (which basically serve as a three-phase to a "comfortable" equivalent two-phase in which the electromagnetic torque); there is also an ideal inverter model implemented with state machines in Stateflow.
The model is validated, in fact by applying a FOC control with classical PI-Cascade architecture I am able to solve the trajectory tracking problem on position.
Now I am trying with an agent with Actor-Critic architecture and in particular a DDPG agent.
Similarly to what is shown in the examples I used for both Actor and Critic the classical fullyConnectedLayer+reluLayer repeated structures.
The Reward functions and the Early Stop Flag of the simulation is based on the errors between physical quantities and reference signals, while the observations are all the currents (both three-phase and transformed) and angular position and velocity and also the reference signals (on one of the currents which must always be zero and the reference for the angular position).
The problem I have is that although it is a similar application to the available matlab examples, where it is shown that in a "small" number of episodes the agent learns to control the system, in my case I see that the reward function remains at very low values.
Hoping that someone can help me and suggest modifications, I share the files that constitute the definition of the agent and the Simulink diagram with my electrical drive model.
Thank you for your attention.
Pierpaolo.
回答(2 个)
Emmanouil Tzorakoleftherakis
2021-2-23
Hello,
I am assuming you have seen this example already? Seems similar. I don't see the script where you set up DDPG but there could be a lot of things going on:
1) The example above has a lot of the inputs/outputs/quantities of interest in p.u. That makes training smoother as things are already normalized. Not sure if that's the case here
2) Along the same lines as 1), make sure the various terms in the reward signal are scaled properly. For example, in the reward signal you scale the sum of id, iq, theta, omega errors. Should that be the case or should these be scaled separately? How does id and iq error compare to theta error? If they are not scald properly, you won't learn what you want
3) Assuming the reward setup is correct, one DDPG parameter which is often overlooked but is very important is Noise Options. This is also related to what the final layers of the actor look like, but basically, if noise variance is not set properly, the agent won't be able to explore and will be stuck to the same local minimum. As the link suggests, make sure the noise variance value makes sense depending on what your action range lookg like.
Hope that helps
0 个评论
paolo dini
2021-2-23
编辑:paolo dini
2021-2-23
2 个评论
Emmanouil Tzorakoleftherakis
2021-2-23
Not sure why training freezes, need more info on that.
But the bottom line is that even if you try to solve the same problem, if you change the environment model, you will likely need to retune various parameters such as rewards etc (especially if these are not normalized).
As I mentioned above, if your actions are not normalized, you will need to play with noise options (again even if's a similar problem, these small details change things a lot)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Introduction to Installation and Licensing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!