RL: Continuous action space, but within a desired range(use PPO)
27 次查看(过去 30 天)
显示 更早的评论
I am now trying to use a PPO in RL training with continuous action space.
However, I want to ensure that the output of my actor always stays within the upper and lower bounds I set. In my environment, I'm using the following code, and my actor network and critic network are as follows.
% observation info
ObservationInfo = rlNumericSpec([n_Pd+n_Pg+1, 1]);
% action info
ActionInfo = rlNumericSpec([n_Pg, 1], ...
'Lowerlimit', Pgmin, ...
'Upperlimit', Pgmax);
Actor network
%% Actor Network
% Input path layers
inPath = [featureInputLayer(numObservations,'Normalization','none','Name','observation')
fullyConnectedLayer(128,'Name','ActorFC1')
reluLayer('Name','ActorRelu1')
fullyConnectedLayer(128,'Name','ActorFC2')
reluLayer('Name', 'ActorRelu2')
fullyConnectedLayer(numActions,'Name','Action')
];
% Path layers for mean value
meanPath = [
tanhLayer(Name="tanhMean");
fullyConnectedLayer(numActions);
scalingLayer('Name','ActorScaling','Scale',actInfo.UpperLimit)
];
% Path layers for standard deviations
% Using softplus layer to make them non negative
sdevPath = [
tanhLayer(Name="tanhStdv");
fullyConnectedLayer(numActions);
softplusLayer(Name="Splus")
];
% Add layers to network object
actorNetwork = layerGraph(inPath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,sdevPath);
% Connect layers
actorNetwork = connectLayers(actorNetwork,"Action","tanhMean/in");
actorNetwork = connectLayers(actorNetwork,"Action","tanhStdv/in");
actorNetwork = dlnetwork(actorNetwork);
% figure(2)
% plot(layerGraph(actorNetwork))
% Setting Actor
actorOptions = rlOptimizerOptions('LearnRate',0.1,'GradientThreshold',inf);
actor = rlContinuousGaussianActor(actorNetwork,obsInfo,actInfo, ...
"ActionMeanOutputNames","ActorScaling", ...
"ActionStandardDeviationOutputNames","Splus");
Critic network
%% Critic Network
criticNetwork = [
featureInputLayer(numObservations,'Normalization','none','Name','observation')
fullyConnectedLayer(128,'Name','CriticFC1')
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = dlnetwork(criticNetwork);
% Setting Critic
criticOptions = rlOptimizerOptions('LearnRate',0.1,'GradientThreshold',inf);
critic = rlValueFunction(criticNetwork,obsInfo);
something eles
%% Create PPO Agent
% Setting PPO Agent Options
agentOptions = rlPPOAgentOptions(...
'SampleTime',Ts,...
'ActorOptimizerOptions',actorOptions,...
'CriticOptimizerOptions',criticOptions,...
'ExperienceHorizon',600,...
'ClipFactor',0.02,...
'EntropyLossWeight',0.01,...
'MiniBatchSize',300, ...
'AdvantageEstimateMethod','gae',...
'GAEFactor',0.95,...
'DiscountFactor',0.99);
% Create Agent
agent = rlPPOAgent(actor,critic,agentOptions);
%% Train Agent
maxepisodes = 10000;
maxsteps = ceil(Nt/Ts);
trainingOptions = rlTrainingOptions(...
'MaxEpisodes',maxepisodes,...
'MaxStepsPerEpisode',maxsteps,...
'StopOnError',"on",...
'Plots',"training-progress",...
'StopTrainingCriteria',"AverageReward",...
'StopTrainingValue',-14500,...
'SaveAgentCriteria',"EpisodeReward",...
'SaveAgentValue',-14500);
% train? 1-train; 0-not train
doTraining = 1;
if doTraining
% Train the agent.
trainingStats = train(agent,env,trainingOptions);
save('XXX.mat','agent')
else
% Load the pretrained agent for the example.
load('XXX.mat','agent')
end
THANKS!
0 个评论
回答(1 个)
Emmanouil Tzorakoleftherakis
2023-10-27
You can always clip the agent output on the environment side. PPO is stochastic so the upper and lower limits are not guaranteed to be respected with the current implementation.
0 个评论
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!