Transient value problem of the variable in reward function of reinforcement learning

2 次查看（过去 30 天）

显示更早的评论

Yihao Wan 2021-3-22

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/779882-transient-value-problem-of-the-variable-in-reward-function-of-reinforcement-learning

评论： Yihao Wan 2021-3-23

采纳的回答： Emmanouil Tzorakoleftherakis

Hello, I encounted a problem when designing the reward function. In the simulink environment, I want to incorporate some variables in the reward function. During the training of RL agent, the varibles will converge after about 0.06s, while the agent is trained from 0s. The enable block doesn't help by putting the RL block in a subsystem.

From my understanding, it will influence the value reward function, which may result in poor trained agent. Does anyone have any suggestions regarding this questions?

Thank you very much.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

采纳的回答

Emmanouil Tzorakoleftherakis 2021-3-22

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/779882-transient-value-problem-of-the-variable-in-reward-function-of-reinforcement-learning#answer_654817

You can put the agent block under a triggered subsystem and set it to begin training after 0.06 seconds

5 个评论
显示 3更早的评论隐藏 3更早的评论

Emmanouil Tzorakoleftherakis 2021-3-23

I believe it should be 40 yes - there is a counter implemented internally that keeps track of how many times the RL Agent block will run

Yihao Wan 2021-3-23

Thank you very much for your help.

请先登录，再进行评论。

更多回答（0 个）

请先登录，再回答此问题。

类别

Control Systems Reinforcement Learning Toolbox Environments

在 Help Center 和 File Exchange 中查找有关 Environments 的更多信息

标签

产品

Simulink

版本

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Translated by