Dealing with delayed observations/reward in RL

Question

Nicolas CRETIN 2024-8-26

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2147949-dealing-with-delayed-observations-reward-in-rl

评论： Nicolas CRETIN 2024-8-27

Hi everyone,

I'm currently facing an issue: my agent can't learn to control the water tank system of the example below if I add an unit delay to the observation signal.

Create Simulink Environment and Train Agent - MATLAB & Simulink - MathWorks France

So, I just added a delay as the following picture shows:

Then, it seems that, the agent can no longer learn what action to take.

But, I guess it is a normal behavior since nothing in the network architecture allow it to learn signal time dependencies. This is why I tried to add Long short-term memory layers (LSTM), but I didn't succeed.

So, in general terms, is adding LSTM layers a good solution to this kind of problems? How can we give a chance to the agent to learn time dependencies in signals?

I'm using a DDPG agent and to add the LSTM layers I set the option UseRNN to true and I leaved the default architecture for the actor and the critic nets.

initOpts = rlAgentInitializationOptions(UseRNN=true)

I'm using the 2023b version and I suspect that the Matlab example doesn't work in the 2024a version.

This would be particularly useful for example for penalising agent for big actions (flow) - adding a penalty proportional to the action taken - or for penalising agent for big action variations - adding a penalty proportional

I added the result of my training below:

Strangely enough, we can see that the flow is always oscillating a little.

For this test, the reward has been slightly modified as follows:

reward = rewardFromTheMatlabExample + 2 / 20 * abs(error) + 2; % add a small continuous component to improve convergence
trainOpts.StopTrainingCriteria="none"; % remove the stopping criteria 

Any help would be greatly appreciated!

Regards

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Nicolas CRETIN 2024-8-27

Hi,

Sorry, I have spoken too fast: after a very long time it worked (22 hours with a powerfull computer). Have a look below:

请先登录，再进行评论。

请先登录，再回答此问题。

Dealing with delayed observations/reward in RL

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Dealing with delayed observations/reward in RL

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论