How to get the value of value function in soft actor critic?

Question

ryunosuke tazawa 2021-10-20

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1567868-how-to-get-the-value-of-value-function-in-soft-actor-critic

回答： Aneela 2024-11-7

在 MATLAB Online 中打开

I want to know the way to get the value of value function.

I am using soft actor critic.

Someone tell me the way?

%  Soft-actor-critic
clear all;
close all;
Length = 1;                              
Mass = 1;                                 
Ts = 0.01;                                 
Theta_Initial = -pi;                       
AngularVelocity_Initial = 0;              
SimplePendulum = classPendulum(Length, Mass, Theta_Initial, AngularVelocity_Initial, Ts);
ObservationInfo = rlNumericSpec([2 1]);
ObservationInfo.Name = 'States';
ObservationInfo.Description = 'Theta, AngularVelocity';
ActionInfo = rlNumericSpec([1 1],'LowerLimit',-100,'UpperLimit',-5);
ActionInfo.Name = 'Action';
ActionInfo.Description = 'F';
ResetHandle = @()myResetFunction(SimplePendulum);
StepHandle = @(Action,LoggedSignals) myStepfunction(Action,LoggedSignals,SimplePendulum);
env = rlFunctionEnv(ObservationInfo, ActionInfo, StepHandle, ResetHandle);
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
numObs  = obsInfo.Dimension(1);
numAct  = numel(actInfo);
device = 'gpu';
% CRITIC
statePath1 = [
    featureInputLayer(numObs,'Normalization','none','Name','observation')
    fullyConnectedLayer(400,'Name','CriticStateFC1')
    reluLayer('Name','CriticStateRelu1')
    fullyConnectedLayer(300,'Name','CriticStateFC2')
    ];
actionPath1 = [
    featureInputLayer(numAct,'Normalization','none','Name','action')
    fullyConnectedLayer(300,'Name','CriticActionFC1')
    ];
commonPath1 = [
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonRelu1')
    fullyConnectedLayer(1,'Name','CriticOutput')
    ];
criticNet = layerGraph(statePath1);
criticNet = addLayers(criticNet,actionPath1);
criticNet = addLayers(criticNet,commonPath1);
criticNet = connectLayers(criticNet,'CriticStateFC2','add/in1');
criticNet = connectLayers(criticNet,'CriticActionFC1','add/in2');
criticOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-3,... 
                                        'GradientThreshold',1,'L2RegularizationFactor',2e-4,'UseDevice',device);
critic1 = rlQValueRepresentation(criticNet,obsInfo,actInfo,...
    'Observation',{'observation'},'Action',{'action'},criticOptions);
critic2 = rlQValueRepresentation(criticNet,obsInfo,actInfo,...
    'Observation',{'observation'},'Action',{'action'},criticOptions);
%ACTOR
statePath = [
    featureInputLayer(numObs,'Normalization','none','Name','observation')
    fullyConnectedLayer(400, 'Name','commonFC1')
    reluLayer('Name','CommonRelu')];
meanPath = [
    fullyConnectedLayer(300,'Name','MeanFC1')
    reluLayer('Name','MeanRelu')
    fullyConnectedLayer(numAct,'Name','Mean')
    ];
stdPath = [
    fullyConnectedLayer(300,'Name','StdFC1')
    reluLayer('Name','StdRelu')
    fullyConnectedLayer(numAct,'Name','StdFC2')
    softplusLayer('Name','StandardDeviation')];
concatPath = concatenationLayer(1,2,'Name','GaussianParameters');
actorNetwork = layerGraph(statePath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = addLayers(actorNetwork,concatPath);
actorNetwork = connectLayers(actorNetwork,'CommonRelu','MeanFC1/in');
actorNetwork = connectLayers(actorNetwork,'CommonRelu','StdFC1/in');
actorNetwork = connectLayers(actorNetwork,'Mean','GaussianParameters/in1');
actorNetwork = connectLayers(actorNetwork,'StandardDeviation','GaussianParameters/in2');
actorOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-3,...
                                       'GradientThreshold',1,'L2RegularizationFactor',1e-5,'UseDevice',device);
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,actorOptions,...
    'Observation',{'observation'});
agentOptions = rlSACAgentOptions;
agentOptions.SampleTime = Ts;
agentOptions.DiscountFactor = 0.99;
agentOptions.TargetSmoothFactor = 1e-3;
agentOptions.ExperienceBufferLength = 1e6;
agentOptions.MiniBatchSize = 32;
agent = rlSACAgent(actor,[critic1 critic2],agentOptions);
getAction(agent,{rand(obsInfo(1).Dimension)});
maxepisodes = 10;
maxsteps = 2;
trainingOptions = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes,...
    'MaxStepsPerEpisode',maxsteps,...
    'StopOnError','on',...
    'Verbose',true,...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',Inf,...
    'ScoreAveragingWindowLength',10); 
trainingStats = train(agent,env,trainingOptions);
% Play the game with the trained agent
simOptions = rlSimulationOptions('MaxSteps',maxsteps);
experience = sim(env,agent,simOptions);
% Q値   Here I want to get the value of value of function,(Qvalue) 
% Is the way correct?
batchobs = rand(2,1,64);
batchact = rand(1,1,64,1);
qvalue = getValue(critic2,{batchobs},{batchact});
%v = getValue(critic2,{rand(2,1)},{rand(1,1)})
%save("kyori30Agent.mat","States")

2 个评论
显示无隐藏无

Martin Forsberg Lie 2021-11-8

编辑：Martin Forsberg Lie 2021-11-8

在 MATLAB Online 中打开

SAC is implemented with two critics, and you must choose the critic:

critic = getCritic(agent);
value = getValue(critic(1),{obs},action);

ryunosuke tazawa 2021-11-19

'The function or variable'agent' is not recognized.'

critic = getCritic(agent);

value = getValue(critic(1),{obs},action);

I added these, but I got the above error.

Do you know how to fix it?

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Aneela 2024-11-7

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1567868-how-to-get-the-value-of-value-function-in-soft-actor-critic#answer_1542500

在 MATLAB Online 中打开

Hi @ryunosuke tazawa

To obtain the Q-value using your trained Soft Actor-Critic (SAC) agent, you can use the "getValue" function. It is used to compute the "Q-value" for a given observation-action pair using the critic network.

Refer to the below code snippet for calculating the "Q-value":

% Assuming 64 as the Mini batch size
batchobs = rand(obsInfo.Dimension(1), 1, 64);
batchact = rand(actInfo.Dimension(1), 1, 64);
qvalue = getValue(critic2, {batchobs}, {batchact});

Refer to the following MathWorks documentation links for more information on calculating the "Q-value" and "getValue" functions respectively:

https://www.mathworks.com/help/reinforcement-learning/ref/rl.function.rlqvaluefunction.html

https://www.mathworks.com/help/reinforcement-learning/ref/rl.function.rlvaluefunction.getvalue.html