can i decide the RL agents actions

Question

Sourabh 2023-9-2

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2016096-can-i-decide-the-rl-agents-actions

评论： Sourabh 2023-10-28

I am training a PPO agent and issue is it keeps on searching for a better value even after reaching close to stable state.

what i mean is I want my agent to keep applying last action values as soon as the error values reaches <= 0.05 (to prevent oscillations and offset near the set point as shown in shared image.)

my question is can i do it in matlab because i know you can do it in python for sure. any help would be really really helpfull :)

3 个评论
显示 1更早的评论隐藏 1更早的评论

Sourabh 2023-9-3

actually i saw it in a IEEE paper and when i asked that guy he told me he was using python.

I dont have any code with me right now but surely there can be a way to decide the action of my agent i feel.

Sourabh 2023-9-4

ppo actions.jpg

okay i might get some code after a week or so

but all i want is to limit the actions of my PPO agent to settle after some time, not act like as shown in image attached.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Sam Chak 2023-9-4

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2016096-can-i-decide-the-rl-agents-actions#answer_1300321

在 MATLAB Online 中打开

Hi @Sourabh

I believe that it has something to do with the StopTrainingCriteria and StopTrainingValue options of your rlTrainingOptions object. Is the condition "steady-state error ≤ 0.05" reflected in the training termination condition? Typically, the agent will continue to train until MaxEpisodes is reached when the stopping condition is not satisfied.

maxepisodes  = 6000;
maxsteps     = 150;
trainingOpts = rlTrainingOptions(...
    'MaxEpisodes', maxepisodes,...
    'MaxStepsPerEpisode', maxsteps,...
    'ScoreAveragingWindowLength', 5, ...
    'Verbose', false,...
    'Plots', 'training-progress',...
    'StopTrainingCriteria', 'AverageReward',...
    'StopTrainingValue', 1500);

Also, please note that the rewards obtained by the final agents are not necessarily the greatest achieved during the training episodes. You need to save the agents that meet the "steady-state error ≤ 0.05" condition during training by specifying the SaveAgentCriteria and SaveAgentValue properties in the rlTrainingOptions object.

2 个评论
显示无隐藏无

Sourabh 2023-9-4

then y r DDPG and TD3 agents working fine?

it has nothing to do with stop training criteria. i just want to settle my agent outputs to previous value as soon as error value reaches 0.05 in training episode.

Sourabh 2023-10-28

https://github.com/backgom2357/Reinforcement_learning_based_PID_Tuner/blob/master/PPO/ppo_agent.py

here is a example of ppo agent for PID tuner on python.

请先登录，再进行评论。

Answer 2

Emmanouil Tzorakoleftherakis 2023-9-25

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2016096-can-i-decide-the-rl-agents-actions#answer_1317932

编辑：Emmanouil Tzorakoleftherakis 2023-9-25

It seems like the paper you saw uses some logic to implement the behavior you mention. You could do the same with an if statement in MATLAB.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Sourabh 2023-9-26

you mean in my script or in my environment.

like can u give an example

请先登录，再进行评论。

can i decide the RL agents actions

3 个评论
显示 1更早的评论隐藏 1更早的评论

回答（2 个）

2 个评论
显示无隐藏无

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

can i decide the RL agents actions

3 个评论 显示 1更早的评论隐藏 1更早的评论

回答（2 个）

2 个评论 显示 无隐藏 无

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

3 个评论
显示 1更早的评论隐藏 1更早的评论

2 个评论
显示无隐藏无

1 个评论
显示 -1更早的评论隐藏 -1更早的评论