What caused actor loss and critic loss to constantly increase; is it something wrong in my implementation or a physical constraints?

Question

Bipin Paudel 2024-3-22

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2097716-what-caused-actor-loss-and-critic-loss-to-constantly-increase-is-it-something-wrong-in-my-implement

评论： Bipin Paudel 2024-3-26

I am trying a Soft Actor Critic framework in my Simscape based models which constitutes signals like voltages, frequencies and phase angles.

The overall goal of this project is to desynchronize some of the signals (increase the distance between each other), while keeping other signals in synchronization. Reward is formulated based on the distance between these signals.

The problem is that my actor loss and critic loss increases over the entire training period. I've tried several hyper parameter tunings such as minibatch size, learning rates, network size, regularization, etc. Also, the rewards during the entire training period are in a range of -10, +10.

Currently, I'm wondering if there's something wrong in my implementation, or my training objective is not physically possible which is causing network to behave randomly. Are there any things I need to look at which I don't know of? line RNN based network, PPO, etc....

Also, the system I'm working with is a simulink based model which has a very small sampling time, 20e-6, due to which my reinforcement learning model has to deal with large array values to determine the terminal conditions. Is this the reason making this more complex?

numObs = 9 * 50; % I have 9 states, and I'm taking past 50 signals to constitute a single state.
obs_window = 1;
obsInfo = rlNumericSpec([numObs, obs_window]);
obsInfo.Description = '';
numAct = 1;
actInfo = rlNumericSpec([numAct, 1]);
actInfo.LowerLimit = -0.15;
actInfo.UpperLimit = 0.1;
env = rlSimulinkEnv('three_VSG_model', strcat('','three_VSG_model/Attacker subsystem/RL Agent'), obsInfo, actInfo);
env.ResetFcn = @(in)localResetFcn(in);
cnet = [
    featureInputLayer(numObs,Name="observation")
    fullyConnectedLayer(512, WeightsInitializer='he')
    concatenationLayer(1,2,Name="concat")
    % batchNormalizationLayer('Name','first')
    reluLayer
    % fullyConnectedLayer(512, WeightsInitializer='he')
    % batchNormalizationLayer('Name','second')
    % reluLayer
    fullyConnectedLayer(128, WeightsInitializer='he')
    % batchNormalizationLayer('Name','third')
    reluLayer
    fullyConnectedLayer(64, WeightsInitializer='he')
    % batchNormalizationLayer('Name','fourth')
    reluLayer
    % fullyConnectedLayer(32, WeightsInitializer='he')
    % batchNormalizationLayer('Name','fifth')
    % reluLayer
    fullyConnectedLayer(1,Name="CriticOutput")];
actionPath = [
    featureInputLayer(numAct,Name="action")
    fullyConnectedLayer(128,Name="fc2")];
% Connect the layers.
criticNetwork = layerGraph(cnet);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = connectLayers(criticNetwork,"fc2","concat/in2");
plot(criticNetwork)
criticdlnet = dlnetwork(criticNetwork,'Initialize',false);
criticdlnet1 = initialize(criticdlnet);
criticdlnet2 = initialize(criticdlnet);
critic1 = rlQValueFunction(criticdlnet1,obsInfo,actInfo, ...
    ObservationInputNames="observation", ActionInputNames='action');
critic2 = rlQValueFunction(criticdlnet2,obsInfo,actInfo, ...
    ObservationInputNames="observation", ActionInputNames='action');
commonPath = [
    featureInputLayer(numObs,Name="observation")
    fullyConnectedLayer(1024, WeightsInitializer='he')
    % batchNormalizationLayer('Name','cbn1')
    % reluLayer
    fullyConnectedLayer(512, WeightsInitializer='he')
    % batchNormalizationLayer('Name','cbn2')
    reluLayer
    fullyConnectedLayer(128, WeightsInitializer='he')
    % batchNormalizationLayer('Name','cbn3')
    reluLayer(Name="anet_out")
    fullyConnectedLayer(32,Name="meanFC", WeightsInitializer='he')
    reluLayer(Name="CommonRelu")
    
    ];
% Define path for mean value
meanPath = [
    fullyConnectedLayer(32,Name="meanIn", WeightsInitializer='he')
    reluLayer
    fullyConnectedLayer(16, WeightsInitializer='he')
    reluLayer
    fullyConnectedLayer(prod(actInfo.Dimension),Name="MeanOut")
    ];
% Define path for standard deviation
stdPath = [
    fullyConnectedLayer(32,Name="stdIn", WeightsInitializer='he')
    reluLayer
    fullyConnectedLayer(16, WeightsInitializer='he')
    reluLayer
    fullyConnectedLayer(prod(actInfo.Dimension))
    softplusLayer(Name="StandardDeviationOut")
    ];
% Connect the layers.
actorNetwork = layerGraph(commonPath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = connectLayers(actorNetwork,"CommonRelu","meanIn/in");
actorNetwork = connectLayers(actorNetwork,"CommonRelu","stdIn/in");
plot(actorNetwork)
actordlnet = dlnetwork(actorNetwork);                                   
summary(actordlnet)
actor = rlContinuousGaussianActor(actordlnet, obsInfo, actInfo, ...
    ObservationInputNames="observation", ...
    ActionMeanOutputNames="MeanOut", ...
    ActionStandardDeviationOutputNames="StandardDeviationOut");
ent = EntropyWeightOptions(...
    LearnRate=1e-4,...
    TargetEntropy=2,...
    EntropyWeight=1);
agentOpts = rlSACAgentOptions( ...
    EntropyWeightOptions=ent, ...
    SampleTime=120e-6, ... % sample time of the system is 20e-6
    TargetSmoothFactor=0.5, ...    
    ExperienceBufferLength=5e6, ...
    MiniBatchSize=1024*5, ...
    NumWarmStartSteps=128, ...
    DiscountFactor=0.90);
agentOpts.ActorOptimizerOptions.Algorithm = "adam";
agentOpts.ActorOptimizerOptions.LearnRate = 1e-5;
agentOpts.ActorOptimizerOptions.GradientThresholdMethod = 'l2norm';
agentOpts.ActorOptimizerOptions.L2RegularizationFactor=0.0005;
for ct = 1:2
    agentOpts.CriticOptimizerOptions(ct).Algorithm = "adam";
    agentOpts.CriticOptimizerOptions(ct).LearnRate = 1e-4;
    agentOpts.CriticOptimizerOptions(ct).L2RegularizationFactor=0.0005;
    agentOpts.CriticOptimizerOptions(ct).GradientThresholdMethod = 'l2norm';
end
agent = rlSACAgent(actor,[critic1,critic2],agentOpts);
max_episodes = 500;
trainOpts = rlTrainingOptions(...
    MaxEpisodes=max_episodes, ...
    MaxStepsPerEpisode=5000, ...
    ScoreAveragingWindowLength=100, ...
    Plots="training-progress", ...
    StopTrainingCriteria="EpisodeCount", ...
    StopTrainingValue=max_episodes, ...
    UseParallel=false, ...
    SaveAgentDirectory=save_folder, ...
    SaveAgentCriteria='EpisodeCount', ...
    SaveAgentValue=10);
trainResult = train(agent,env,trainOpts, Logger=logger);