Using Reinforcement learning for parameter estimation

Question

ehsan hatami 2023-7-30

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2002452-using-reinforcement-learning-for-parameter-estimation

回答： Shantanu Dixit 2024-9-4

Greetings everyone.I am using DDPG agent for parameter estimation of a linear second order system. The action is estimated parameter since the control input is a pseudo random signal. I tested critic-actors with different layer number and also different activation function. But it doesn't work. for example when I used tanhLayer as actor activation function, no matter what is desired parameter, it sharply converges to -1. for other activation function, there is no convergence. The structure of crtic and actor is as follow:

%% CRITIC
statePath = [
    featureInputLayer(numObs, 'Normalization', 'none', 'Name', 'observation')
    fullyConnectedLayer(25, 'Name', 'CriticStateFC1')
    reluLayer('Name', 'CriticRelu1')
    fullyConnectedLayer(50, 'Name', 'CriticStateFC2')];
actionPath = [
    featureInputLayer(numAct, 'Normalization', 'none', 'Name', 'action')
    fullyConnectedLayer(50, 'Name', 'CriticActionFC1', 'BiasLearnRateFactor', 0)];
    reluLayer
    
commonPath = [
    additionLayer(2,'Name', 'add')
    reluLayer('Name','CriticCommonRelu')
    fullyConnectedLayer(1, 'Name', 'CriticOutput')];
%% ACTOR
actorNetwork = [
    featureInputLayer(numObs, 'Normalization', 'none', 'Name', 'observation')
    fullyConnectedLayer(25, 'Name', 'ActorFC1')
    reluLayer('Name', 'ActorRelu1')
    fullyConnectedLayer(50, 'Name', 'ActorFC2')
    reluLayer('Name', 'ActorRelu2')
    fullyConnectedLayer(1, 'Name', 'ActorFC3')
    eluLayer('Name', 'eluLayer')
    ];

Can anyone help me with this?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Shantanu Dixit 2024-9-4

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2002452-using-reinforcement-learning-for-parameter-estimation#answer_1510659

Hi Ehsan ,

Although pin-pointing the exact cause of the issue can be challenging, here are some potential workarounds you can consider experimenting with:

Convergence to -1 with 'tanh' could be due to following possible reasons:

Weight Initialization: Poor weight initialization can push the network towards the saturation region of the tanh function, leading to premature convergence. Consider experimenting with different initialization methods, such as Glorot or He initialization.
Output Scaling: Try scaling your output to match the desired parameter range using a scalingLayer. This can help the network to map the output more effectively to the target values.

Activation Functions: Given that ReLU can suffer from the dying ReLU problem, you might want to experiment with alternatives like Leaky ReLU or ELU, which can help maintain a flow of gradients even when neurons become inactive.

Normalization: Normalizing the inputs (both observations and actions) can be crucial in stabilizing the learning process, as it ensures that the inputs are on a similar scale.

Network Architecture: If you've already experimented with different layer numbers, you can also try varying the layer sizes. The capacity of the network might be influencing its ability to approximate the target function.

Gradient Clipping: Large gradients might cause the network to diverge. Implementing gradient clipping can help by capping the gradients and preventing the network to diverge.

Refer to the below MathWorks documentation for more information regarding the above-mentioned functions:

Scaling layer: www.mathworks.com/help/reinforcement-learning/ref/rl.layer.scalinglayer.html

Weight Initialization: www.mathworks.com/help/deeplearning/ug/initialize-learnable-parameters-for-custom-training-loop.html

Gradient Clipping: www.mathworks.com/help/deeplearning/ref/trainingoptions.html#bu59f0q_sep_mw_bc4a14e1-85f3-470f-86ad-2482233ce9af