Can observation and reward be the same signal in a RL system?

Question

Jize Liu 2022-4-25

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1704200-can-observation-and-reward-be-the-same-signal-in-a-rl-system

评论： Jize Liu 2024-4-6

When I tried to train a RL system, I created a simulink model, where there is only one action and one observation, which is the reward. Then I encountered an error named" containing algebraic loop" when I tried to train it. So I wonder if the way I define observation and reward caused this problem.

The reason why I define reward and observation as the same signal is they act the same role in this system, I want the agent get only this signal from the environment, so I just define one observation representing both observation and reward for avoiding redundance.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Poorna 2024-3-31

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1704200-can-observation-and-reward-be-the-same-signal-in-a-rl-system#answer_1433946

Hi Jize,

I see that you want to use the same signal both as an observation and reward in your reinforcement learning setup. It is to be noted that observation and reward do not occur at the same time.

In a reinforcement learning setting you first make an observation i.e, the current state of the system, and then pick an action and execute it. Your system will then move to a new state. The reward that you get at the end of this transition is a function of your initial state, the action and the resultant next state. When you say you want to use the same signal as reward and observation. It means that the reward you get at time step 't', will be the observation at time step 't+1'.

The algebraic loop error you're encountering arises from attempting to use the reward at time step (t) directly as the observation at the same time step (t), which creates a paradoxical situation. This is because the system is being asked to observe a signal that has not yet been generated, resulting in a logical inconsistency.

So, you should try adding an "unit delay" block when you pass the reward as observation to the system. By doing this you are essentially sending the reward of previous transition as obsevation to the current transition.

To know more about the "unit delay" block, refer to the following documentation:

https://www.mathworks.com/help/simulink/slref/unitdelay.html

Hope this Helps!.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Jize Liu 2024-4-6

Thank you for your reply. This should help. I have one point want to confirm: So in one cycle(t), the system starts from receiving an observation and ends with a reward, and in the next cycle(t+1), the new observation, which could be the reward from the last cycle, will be input to the system and start a new period. Is this so?

请先登录，再进行评论。

Can observation and reward be the same signal in a RL system?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Can observation and reward be the same signal in a RL system?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论