Reinforcement learning to tune a PI controller

21 次查看(过去 30 天)
I’ve been studying the MathWorks official example “Tune PI Controller Using Reinforcement Learning” (link: https://ww2.mathworks.cn/help/reinforcement-learning/ug/tune-pi-controller-using-td3.html?s_tid=srchtitle_site_search_3_TD3) and have some questions during the learning process.
1.When using reinforcement learning to tune a PI controller, is a fixed set of parameters (kp, ki) used for control in the end? (During the simulation, kp and ki do not change in real time, similar to a fuzzy-PID or BP-PID).
2.Will its control performance be comparable to online-tuning algorithms?

采纳的回答

Sam Chak
Sam Chak 2025-12-15,10:12
If you scroll down to the "Validate Trained Agent" section, you will observe that the RL agent returns a set of fixed values for the proportional and integral gains.
Comparison to Fuzzy PID Controllers:
In the design of the Fuzzy PID controller, the control gains can change in real time, depending on the architecture of the controller. For example, human designers can intelligently use fuzzy rules to tune the PID parameters:
In the fixed-valued Fuzzy PID control architecture, it appears as follows:
where , , and and fixed values.
Comparison to online-tuning algorithms:
Most online tuning algorithms typically adjust the parameters of a controller (such as gains), which subsequently determine the control action under dynamic operating conditions. The gains often change continuously or at preset intervals during operation. The algorithm observes the current error from a setpoint in real time and decides whether to update a parameter (such as increasing or decreasing a gain) to enhance future performance. The updated controller employs these new values to calculate the final control action. However, some optimization algorithms may adjust the control signals more directly, such as thrust and angle in interplanetary transfer missions, when the control law is either unavailable or overly complex.
In the example where the PI controller for the water tank is tuned by an RL agent, an offline optimization approach is employed because the system operates under static conditions (the size of the water tank does not change over time, and the water level setpoint is typically fixed). The offline algorithm conducts a test (such as a step response) to determine the "best" set of gains in a simulated environment. Once identified, these gains are fixed and used for standard operation until a human operator or a new trigger event initiates another tuning session.
  2 个评论
yiwei
yiwei 2025-12-15,11:51
Thank you very much for your answer. This undoubtedly resolved my confusion.I would also like to ask whether reinforcement learning can be used for online tuning. If so, are there learning resources in this area? Thanks again.
Sam Chak
Sam Chak 2025-12-15,16:10
The example of "Quadruped Robot Locomotion Using DDPG Agent" uses RL for online optimization. Instead of determining the control gains, which are commonly used in conventional strategies to calculate the control action, the RL agent directly generates eight control torque signals for the revolute joints of the robot's four legs.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Sequence and Numeric Feature Data Workflows 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by