PPOAgentの実装について
    1 次查看(过去 30 天)
  
       显示 更早的评论
    
現在、simlinkの自作環境をPPOAgentを用いて制御しようとしています。
しかし、下のようなエラーが発生し上手くいかない状況が続いています。
どのように改善すればよろしいでしょうか。
エラー: rl.representation.rlStochasticActorRepresentation (line 32)
Number of outputs for a continuous stochastic actor representation must be two times the number of actions.
エラー: rlStochasticActorRepresentation (line 139)
Rep = rl.representation.rlStochasticActorRepresentation(...
自分のコード
clear all
motion_time_constant = 0.01;
mdl = 'fivelinkrl';
open_system(mdl)
Ts = 0.05;
Tf = 20;
mdl = 'fivelinkrl';
open_system(mdl)
agentblk = [mdl '/RL Agent'];
numObs = 15;
obsInfo = rlNumericSpec([numObs 1]);
obsInfo.Name = 'observations';
numAct = 5;
actInfo = rlNumericSpec([numAct 1],'LowerLimit',-10,'UpperLimit',10);
actInfo.Name = 'Action';
% define environment
env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);
%createPPOAgent
criticLayerSizes = [400 300];
actorLayerSizes = [400 300];
createNetworkWeights;
criticNetwork = [imageInputLayer([numObs 1 1],'Normalization','none','Name','observations')
    fullyConnectedLayer(criticLayerSizes(1),'Name','CriticFC1', ... 
                                            'Weights',weights.criticFC1, ...
                                            'Bias',bias.criticFC1)
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(criticLayerSizes(2),'Name','CriticFC2', ...
                                            'Weights',weights.criticFC2, ... 
                                            'Bias',bias.criticFC2)
    reluLayer('Name','CriticRelu2')
    fullyConnectedLayer(1,'Name','CriticOutput',...
                          'Weights',weights.criticOut,...
                          'Bias',bias.criticOut)];
criticOpts = rlRepresentationOptions('LearnRate',1e-3);
critic = rlValueRepresentation(criticNetwork,env.getObservationInfo, ...
                          'Observation',{'observations'},criticOpts);
actorNetwork = [imageInputLayer([numObs 1 1],'Normalization','none','Name','observations')
    fullyConnectedLayer(actorLayerSizes(1),'Name','ActorFC1',...
                                           'Weights',weights.actorFC1,...
                                           'Bias',bias.actorFC1)
    reluLayer('Name','ActorRelu1')
    fullyConnectedLayer(actorLayerSizes(2),'Name','ActorFC2',...
                                           'Weights',weights.actorFC2,...
                                           'Bias',bias.actorFC2)
    reluLayer('Name','ActorRelu2')
    fullyConnectedLayer(numAct,'Name','Action',...
                               'Weights',weights.actorOut,...
                               'Bias',bias.actorOut)
    softmaxLayer('Name','actionProbability')
    ];  
actorOptions = rlRepresentationOptions('LearnRate',1e-3);
%%%%  ↓error   %%%%%%%%%%%%%%%%%
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,... 
                         'Observation',{'observations'}, actorOptions);
%%%%  ↑error   %%%%%%%%%%%%%%%%%%
opt = rlPPOAgentOptions('ExperienceHorizon',512,...
                        'ClipFactor',0.2,...
                        'EntropyLossWeight',0.02,...
                        'MiniBatchSize',64,...
                        'NumEpoch',3,...
                        'AdvantageEstimateMethod','gae',...
                        'GAEFactor',0.95,...
                        'SampleTime',0.05,...
                        'DiscountFactor',0.9995);
agent = rlPPOAgent(actor,critic,opt);  
%TrainAgent
maxEpisodes = 4000;
maxSteps = floor(Tf/Ts);
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',maxEpisodes,...
    'MaxStepsPerEpisode',maxSteps,...
    'ScoreAveragingWindowLength',250,...
    'Verbose',false,...
    'Plots','training-progress',...
    'StopTrainingCriteria','EpisodeCount',...
    'StopTrainingValue',maxEpisodes,...
    'SaveAgentCriteria','EpisodeCount',...
    'SaveAgentValue',maxEpisodes);
trainingStats = train(agent,env,trainOpts);
save('agent.mat', 'agent')
Result in simulation
simOptions = rlSimulationOptions('MaxSteps',maxSteps);
experience = sim(env,agent,simOptions);
0 个评论
回答(1 个)
  Toshinobu Shintai
    
 2020-9-11
        PPOエージェントのactionは離散でなければなりませんので、actionInfoは、例えば以下のように定義します。
actInfo = rlFiniteSetSpec({[-1, -1, -1], [1, 1, 1]});
上記の場合は、numActは2となります。numActにはアクションのパターン数を入力します。
制御器の出力としての次元数は、上記の[-1, -1, -1]のベクトルの次元数として指定します。このとき、PPOエージェントは [-1, -1, -1] か [1, 1, 1] を、その時のobservationに応じて選択して出力します。
修正したコードを添付しましたのでご確認ください。
另请参阅
类别
				在 Help Center 和 File Exchange 中查找有关 モデルの準備 的更多信息
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
