Is my Reinforcement-Learning Agent learning the wrong policy due to delays in Simulink?

Question

Mirjan Heubaum 2022-8-3

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1773265-is-my-reinforcement-learning-agent-learning-the-wrong-policy-due-to-delays-in-simulink

评论： Matteo D'Ambrosio 2023-5-28

Hi all,

I have some problems regarding a RL setup using Simulink for connecting the agent with the environment. Unfortunately I have to use a unit delay after the action ouput to break an agebraic loop including the agent. I'm also playing around with different sample times for the agent than for the environment. E.g. the simulation uses Ts=0.1 while the agent uses Ts=1 because I don't need frequent action updates. In that way the reward is delayed. If the agent block is executed at T=2, a new action is outputted at T=2 and the corresponding reward is not given to the agent block until e.g. T=2.1 or T=2.2.

Is the algorithm then assigning the reward from the previous action to the current action and then learning the wrong policy?
From my understanding, increasing the discount factor could stabilize this, but this should lead to a low sample efficiency, right?
Is there the same problem when using Matlab Environments?

Thanks for any comments on this.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Matteo D'Ambrosio 2023-5-28

Hello,

In my Simulink environment i resort to the same unit delay to avoid algebraic loops, but i can assure you my trainings converge on a complex model. The difference is that i use a continuous environment, with a discrete time agent, but in any case the agent is not learning the "wrong" policy.

The fact that "rewards are delayed" does not sound correct to me, since the reward computations follow the agent's sample time, so it should not have anything to do with the environment.

请先登录，再进行评论。

请先登录，再回答此问题。

Is my Reinforcement-Learning Agent learning the wrong policy due to delays in Simulink?

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Is my Reinforcement-Learning Agent learning the wrong policy due to delays in Simulink?

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论