Deep reinforcement learning and TD3 algorithm in the PID control

Question

Francisco Rodriguez Sanchez 2023-10-16

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2034309-deep-reinforcement-learning-and-td3-algorithm-in-the-pid-control

评论： Sam Chak 2023-10-16

I'm relatively new to reinforcement learning. I have a project where I need to use RL's TD3 algorithm to tune the parameters of a PID controller, in my case it is a continuous controller. I have read some articles that describe the use of RL specifically for PID control, but they do not describe its hyperparameters, diagrams, among others and I have problems applying the TD3 algorithm in my cartpole system. Perhaps someone can guide me on the use of the TD3 algorithm for parameter adjustment, for example, my algorithm trains but at the end of the pole it does not reach the target position of 180°. I hope someone can guide me, thank you!!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Emmanouil Tzorakoleftherakis 2023-10-16

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2034309-deep-reinforcement-learning-and-td3-algorithm-in-the-pid-control#answer_1334394

Have you seen this example?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 2

Sam Chak 2023-10-16

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2034309-deep-reinforcement-learning-and-td3-algorithm-in-the-pid-control#answer_1334424

在 MATLAB Online 中打开

Hi @Francisco Rodriguez Sanchez

As a matter of common sense, for the cartpole system, any angle that deviates from precisely ±180° upright will inevitably result in the pendulum falling due to the force of gravity (

). Therefore, maintaining the pendulum at exactly 180° is the desired behavior. In an ideal scenario, your reward function might resemble the following:

If the pendulum is precisely at ±180°, a positive reward is provided.
For any deviation from ±180°, a negative reward or no reward is assigned.

However, it is essential to be aware that training an RL agent to execute this swing-up task with sparse rewards can be exceedingly challenging. This challenge arises from the fact that it offers positive feedback only for achieving the exact target state. Additionally, the agent may require an impractically extensive amount of time to discover the correct actions that lead to success, as it is highly improbable to randomly stumble upon the precise 180° position during exploration.

To address this challenge, many papers in the field suggest incorporating intermediate rewards for approaching the target state. From the perspective of a control engineer, it may be advantageous to create a reward function based on three key components, encompassing transient behavior, steady-state behavior, and error behavior:

Swing-up reward (transient): A substantial positive reward is given when the agent effectively swings the pendulum from the bottom position to an angle close to 180°.
Balance reward (steady-state): A modest positive reward is awarded for maintaining the pendulum within a predefined range of ±2% around 180°. This encourages the agent to maintain the pendulum close to the upright position.
Failure Penalty (error): A negative penalty is imposed when the pendulum falls or deviates significantly from the upright position, discouraging undesirable behavior.

Gp = tf(10^2, [1 sqrt(2)*10 10^2])

Gp = 100 ------------------- s^2 + 14.14 s + 100 Continuous-time transfer function.

step(Gp, 1.2)

ylabel('Pendulum angular position')

title('Response of a pendulum')

yt = 0:0.2:1.2;

yticks(yt);

yticklabels({'0', '36', '72', '108', '144', '180', '216'})

yline(1+0.02, '--', 'color', '#D90e11')

yline(1-0.02, '--', 'color', '#D90e11')

xline(0.6, '-.', 'color', '#f1a45c')

text(0.1, 1.1, 'Transient behavior')

text(0.7, 1.1, 'Steady-state behavior')

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Sam Chak 2023-10-16

You can also find the example of training a deep Q-learning network (DQN) agent to balance a cart-pole system at this link:

https://www.mathworks.com/help/reinforcement-learning/ug/train-dqn-agent-to-balance-cart-pole-system.html

请先登录，再进行评论。

Deep reinforcement learning and TD3 algorithm in the PID control

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Deep reinforcement learning and TD3 algorithm in the PID control

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

更多回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论