Hi everyone,
I'm currently facing an issue: my agent can't learn to control the water tank system of the example below if I add an unit delay to the observation signal.
So, I just added a delay as the following picture shows:
Then, it seems that, the agent can no longer learn what action to take.
But, I guess it is a normal behavior since nothing in the network architecture allow it to learn signal time dependencies. This is why I tried to add Long short-term memory layers (LSTM), but I didn't succeed.
So, in general terms, is adding LSTM layers a good solution to this kind of problems? How can we give a chance to the agent to learn time dependencies in signals?
I'm using a DDPG agent and to add the LSTM layers I set the option UseRNN to true and I leaved the default architecture for the actor and the critic nets.
initOpts = rlAgentInitializationOptions(UseRNN=true)
I'm using the 2023b version and I suspect that the Matlab example doesn't work in the 2024a version.
This would be particularly useful for example for penalising agent for big actions (flow) - adding a penalty proportional to the action taken - or for penalising agent for big action variations - adding a penalty proportional
I added the result of my training below:
Strangely enough, we can see that the flow is always oscillating a little.
For this test, the reward has been slightly modified as follows:
reward = rewardFromTheMatlabExample + 2 / 20 * abs(error) + 2;
trainOpts.StopTrainingCriteria="none";
Any help would be greatly appreciated!
Regards