Reinforcement learning to tune a PI controller

Question

yiwei 2025-12-15，8:36

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2181814-reinforcement-learning-to-tune-a-pi-controller

评论： Sam Chak 2025-12-15，16:10

I’ve been studying the MathWorks official example “Tune PI Controller Using Reinforcement Learning” (link: https://ww2.mathworks.cn/help/reinforcement-learning/ug/tune-pi-controller-using-td3.html?s_tid=srchtitle_site_search_3_TD3) and have some questions during the learning process.

1.When using reinforcement learning to tune a PI controller, is a fixed set of parameters (kp, ki) used for control in the end? (During the simulation, kp and ki do not change in real time, similar to a fuzzy-PID or BP-PID).

2.Will its control performance be comparable to online-tuning algorithms?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Sam Chak 2025-12-15，10:12

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2181814-reinforcement-learning-to-tune-a-pi-controller#answer_1572694

Hi @yiwei

If you scroll down to the "Validate Trained Agent" section, you will observe that the RL agent returns a set of fixed values for the proportional and integral gains.

Comparison to Fuzzy PID Controllers:

In the design of the Fuzzy PID controller, the control gains can change in real time, depending on the architecture of the controller. For example, human designers can intelligently use fuzzy rules to tune the PID parameters:

In the fixed-valued Fuzzy PID control architecture, it appears as follows:

where

,

, and

and fixed values.

Comparison to online-tuning algorithms:

Most online tuning algorithms typically adjust the parameters of a controller (such as gains), which subsequently determine the control action under dynamic operating conditions. The gains often change continuously or at preset intervals during operation. The algorithm observes the current error from a setpoint in real time and decides whether to update a parameter (such as increasing or decreasing a gain) to enhance future performance. The updated controller employs these new values to calculate the final control action. However, some optimization algorithms may adjust the control signals more directly, such as thrust and angle in interplanetary transfer missions, when the control law is either unavailable or overly complex.

In the example where the PI controller for the water tank is tuned by an RL agent, an offline optimization approach is employed because the system operates under static conditions (the size of the water tank does not change over time, and the water level setpoint is typically fixed). The offline algorithm conducts a test (such as a step response) to determine the "best" set of gains in a simulated environment. Once identified, these gains are fixed and used for standard operation until a human operator or a new trigger event initiates another tuning session.

2 个评论
显示无隐藏无

yiwei 2025-12-15，11:51

Thank you very much for your answer. This undoubtedly resolved my confusion.I would also like to ask whether reinforcement learning can be used for online tuning. If so, are there learning resources in this area? Thanks again.

Sam Chak 2025-12-15，16:10

Hi @yiwei

The example of "Quadruped Robot Locomotion Using DDPG Agent" uses RL for online optimization. Instead of determining the control gains, which are commonly used in conventional strategies to calculate the control action, the RL agent directly generates eight control torque signals for the revolute joints of the robot's four legs.

请先登录，再进行评论。

Reinforcement learning to tune a PI controller

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

2 个评论
显示无隐藏无

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Reinforcement learning to tune a PI controller

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

2 个评论 显示 无隐藏 无

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

2 个评论
显示无隐藏无