Using Reinforcement learning for parameter estimation

6 次查看(过去 30 天)
Greetings everyone.I am using DDPG agent for parameter estimation of a linear second order system. The action is estimated parameter since the control input is a pseudo random signal. I tested critic-actors with different layer number and also different activation function. But it doesn't work. for example when I used tanhLayer as actor activation function, no matter what is desired parameter, it sharply converges to -1. for other activation function, there is no convergence. The structure of crtic and actor is as follow:
%% CRITIC
statePath = [
featureInputLayer(numObs, 'Normalization', 'none', 'Name', 'observation')
fullyConnectedLayer(25, 'Name', 'CriticStateFC1')
reluLayer('Name', 'CriticRelu1')
fullyConnectedLayer(50, 'Name', 'CriticStateFC2')];
actionPath = [
featureInputLayer(numAct, 'Normalization', 'none', 'Name', 'action')
fullyConnectedLayer(50, 'Name', 'CriticActionFC1', 'BiasLearnRateFactor', 0)];
reluLayer
commonPath = [
additionLayer(2,'Name', 'add')
reluLayer('Name','CriticCommonRelu')
fullyConnectedLayer(1, 'Name', 'CriticOutput')];
%% ACTOR
actorNetwork = [
featureInputLayer(numObs, 'Normalization', 'none', 'Name', 'observation')
fullyConnectedLayer(25, 'Name', 'ActorFC1')
reluLayer('Name', 'ActorRelu1')
fullyConnectedLayer(50, 'Name', 'ActorFC2')
reluLayer('Name', 'ActorRelu2')
fullyConnectedLayer(1, 'Name', 'ActorFC3')
eluLayer('Name', 'eluLayer')
];
Can anyone help me with this?

回答(1 个)

Shantanu Dixit
Shantanu Dixit 2024-9-4
Hi Ehsan ,
Although pin-pointing the exact cause of the issue can be challenging, here are some potential workarounds you can consider experimenting with:
Convergence to -1 with 'tanh' could be due to following possible reasons:
  • Weight Initialization: Poor weight initialization can push the network towards the saturation region of the tanh function, leading to premature convergence. Consider experimenting with different initialization methods, such as Glorot or He initialization.
  • Output Scaling: Try scaling your output to match the desired parameter range using a scalingLayer. This can help the network to map the output more effectively to the target values.
Activation Functions: Given that ReLU can suffer from the dying ReLU problem, you might want to experiment with alternatives like Leaky ReLU or ELU, which can help maintain a flow of gradients even when neurons become inactive.
Normalization: Normalizing the inputs (both observations and actions) can be crucial in stabilizing the learning process, as it ensures that the inputs are on a similar scale.
Network Architecture: If you've already experimented with different layer numbers, you can also try varying the layer sizes. The capacity of the network might be influencing its ability to approximate the target function.
Gradient Clipping: Large gradients might cause the network to diverge. Implementing gradient clipping can help by capping the gradients and preventing the network to diverge.
Refer to the below MathWorks documentation for more information regarding the above-mentioned functions:

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by