<Deep reinforcement learning____PPO>How do you fix this error?Ask for help
3 次查看(过去 30 天)
显示 更早的评论
Hi everyone,
I am using PPO algorithm to train the agent in the custom environment, but there is an error.
I think it may be related to obsInfo, but I don't know how to solve this error? Below is my code and error log.
Please help the weak and helpless me, very grateful.
slx = 'RLcontrolstrategy0312';
open_system(slx);
agentblk = slx +"/agent";
%obsinfo actinfo
%Is that the problem?
obsInfo=rlNumericSpec([49,1], ...
'LowerLimit',0, ...
'UpperLimit',1);
actInfo = rlNumericSpec([6,1], 'LowerLimit',[0 0 0 -1 -1 -1]','UpperLimit',[1 1 1 1 1 1]');
scale = [0.5 0.5 0.5 1 1 1]';
bias = [0.5 0.5 0.5 0 0 0]';
env = rlSimulinkEnv(slx,agentblk,obsInfo,actInfo);
Ts = 0.001;
Tf = 4;
rng(0)
%critic
cnet = [
featureInputLayer(9,"Normalization","none","Name","observation1")
fullyConnectedLayer(256,"Name","fc1")
concatenationLayer(1,3,"Name","concat")
tanhLayer("Name","tanh1")
fullyConnectedLayer(256,"Name","fc2")
tanhLayer("Name","tanh2")
fullyConnectedLayer(128,"Name","fc3")
tanhLayer("Name","tanh3")
fullyConnectedLayer(64,"Name","fc4")
tanhLayer("Name","tanh4")
fullyConnectedLayer(32,"Name","fc5")
tanhLayer("Name","tanh5")
fullyConnectedLayer(1,"Name","CriticOutput")];
cnetMCT=[
featureInputLayer(20,"Normalization","none","Name","observation2")
fullyConnectedLayer(256,"Name","fc11")
tanhLayer("Name","tanh13")
fullyConnectedLayer(64,"Name","fc14")
tanhLayer("Name","tanh14")
fullyConnectedLayer(32,"Name","fc15")];
cnetMCR=[
featureInputLayer(20,"Normalization","none","Name","observation3")
fullyConnectedLayer(256,"Name","fc21")
tanhLayer("Name","tanh23")
fullyConnectedLayer(64,"Name","fc24")
tanhLayer("Name","tanh24")
fullyConnectedLayer(32,"Name","fc25")];
criticNetwork = layerGraph(cnet);
criticNetwork = addLayers(criticNetwork, cnetMCT);
criticNetwork = connectLayers(criticNetwork,"fc15","concat/in2");
criticNetwork = addLayers(criticNetwork, cnetMCR);
criticNetwork = connectLayers(criticNetwork,"fc25","concat/in3");
criticdlnet = dlnetwork(criticNetwork,'Initialize',false);
criticdlnet1 = initialize(criticdlnet);
%Is that the problem?
critic= rlValueFunction(criticdlnet1,obsInfo, ...
ObservationInputNames=["observation1","observation2","observation3"]);
%actor
anet = [
featureInputLayer(9,"Normalization","none","Name","ain1")
fullyConnectedLayer(256,"Name","fc1")
concatenationLayer(1,3,"Name","concat")
tanhLayer("Name","tanh1")
fullyConnectedLayer(256,"Name","fc2")
tanhLayer("Name","tanh2")
fullyConnectedLayer(128,"Name","fc3")
tanhLayer("Name","tanh3")
fullyConnectedLayer(64,"Name","fc4")
tanhLayer("Name","tanh4")];
anetMCT=[
featureInputLayer(20,"Normalization","none","Name","ain2")
fullyConnectedLayer(256,"Name","fc11")
tanhLayer("Name","tanh13")
fullyConnectedLayer(64,"Name","fc14")
tanhLayer("Name","tanh14")
fullyConnectedLayer(32,"Name","fc15")];
anetMCR=[
featureInputLayer(20,"Normalization","none","Name","ain3")
fullyConnectedLayer(256,"Name","fc21")
tanhLayer("Name","tanh23")
fullyConnectedLayer(64,"Name","fc24")
tanhLayer("Name","tanh24")
fullyConnectedLayer(32,"Name","fc25")];
meanPath = [
fullyConnectedLayer(32,"Name","meanFC")
tanhLayer("Name","tanh5")
fullyConnectedLayer(numAct,"Name","mean")
tanhLayer("Name","tanh6")
scalingLayer(Name="meanPathOut",Scale=scale,Bias=bias)];
stdPath = [
fullyConnectedLayer(32,"Name","stdFC")
tanhLayer("Name","tanh7")
fullyConnectedLayer(numAct,"Name","fc5")
softplusLayer("Name","std")];
actorNetwork = layerGraph(anet);
actorNetwork = addLayers(actorNetwork,anetMCT);
actorNetwork = addLayers(actorNetwork,anetMCR);
actorNetwork = connectLayers(actorNetwork,"fc15","concat/in2");
actorNetwork = connectLayers(actorNetwork,"fc25","concat/in3");
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = connectLayers(actorNetwork,"tanh4","meanFC/in");
actorNetwork = connectLayers(actorNetwork,"tanh4","stdFC/in");
actordlnet = dlnetwork(actorNetwork);
%Is that the problem?
actor = rlContinuousGaussianActor(actordlnet,obsInfo,actInfo, ...
"ActionMeanOutputNames","meanPathOut", ...
"ActionStandardDeviationOutputNames","std", ...
ObservationInputNames= ["ain1","ain2","ain3"]);
%agent
agentOptions=rlPPOAgentOptions("SampleTime",Ts,"DiscountFactor",0.995,"ExperienceHorizon",1024,"MiniBatchSize",512,"ClipFactor",0.2, ...
"EntropyLossWeight",0.01,"NumEpoch",8,"AdvantageEstimateMethod","gae","GAEFactor",0.98, ...
"NormalizedAdvantageMethod","current");
agent=rlPPOAgent(actor,critic,agentOptions);
%training
trainOptions=rlTrainingOptions("StopOnError","on", "MaxEpisodes",2000,"MaxStepsPerEpisode",floor(Tf/Ts), ...
"ScoreAveragingWindowLength",10,"StopTrainingCriteria","AverageReward", ...
"StopTrainingValue",100000,"SaveAgentCriteria","None", ...
"SaveAgentDirectory","D:\car\jianmo\zhangxiang\agent","Verbose",false, ...
"Plots","training-progress");
trainingStats = train(agent,env,trainOptions);
The debug logs are as follows
Incorrect use of rl.internal.validate.mapFunctionObservationInput
Number of input layers for deep neural network must equal to number of observation specifications.
error rlValueFunction(Line 92)
modelInputMap = rl.internal.validate.mapFunctionObservationInput(model,observationInfo,nameValueArgs.ObservationInputNames);
error ppo(Line 187)
critic= rlValueFunction(criticdlnet1,obsInfo, ...
0 个评论
回答(1 个)
Ronit
2024-3-27
Hi,
Based on the error log you've provided, the issue seems to be with the number of observation inputs expected by your neural network model and the number of observation specifications you've defined. This error is thrown by the ‘rlValueFunction’ when initializing the critic, indicating that the critic's network does not match the observation information ‘obsInfo’ you've specified.
You have defined ‘obsinfo’ as a single object and while initializing the critic with ‘rlValueFunction’, you have specified three observation input names:
critic= rlValueFunction(criticdlnet1,obsInfo, ...
ObservationInputNames=["observation1","observation2","observation3"]);
This discrepancy between the number of ‘obsInfo’ objects (1) and the number of observation input names (3) is causing of the error.
To resolve this issue, ensure that the number of ‘obsInfo’ objects matches the number of observation input names you've specified for your network. If your environment produces three distinct observations, you should define an ‘obsInfo’ object for each and pass them as a vector to the ‘rlValueFunction’.
For more information regarding ‘rlValueFunction’ function, please refer to this documentation - https://www.mathworks.com/help/reinforcement-learning/ref/rl.function.rlvaluefunction.html#responsive_offcanvas
Hope this helps!
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Training and Simulation 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!