<Deep reinforcement learning____PPO>How do you fix this error?Ask for help

Question

0 个投票

Hi everyone,

I am using PPO algorithm to train the agent in the custom environment, but there is an error.

I think it may be related to obsInfo, but I don't know how to solve this error? Below is my code and error log.

Please help the weak and helpless me, very grateful.

slx = 'RLcontrolstrategy0312';   
open_system(slx);
agentblk = slx +"/agent";
%obsinfo actinfo
%Is that the problem?
obsInfo=rlNumericSpec([49,1], ...
    'LowerLimit',0, ...
    'UpperLimit',1);
actInfo = rlNumericSpec([6,1], 'LowerLimit',[0 0 0 -1 -1 -1]','UpperLimit',[1 1 1 1 1 1]'); 
scale = [0.5 0.5 0.5 1 1 1]';
bias = [0.5 0.5 0.5 0 0 0]';
env = rlSimulinkEnv(slx,agentblk,obsInfo,actInfo);
Ts = 0.001;
Tf = 4;
rng(0)
%critic
cnet = [
    featureInputLayer(9,"Normalization","none","Name","observation1")
    fullyConnectedLayer(256,"Name","fc1")
    concatenationLayer(1,3,"Name","concat")
    tanhLayer("Name","tanh1")
    fullyConnectedLayer(256,"Name","fc2")
    tanhLayer("Name","tanh2")
    fullyConnectedLayer(128,"Name","fc3")
    tanhLayer("Name","tanh3")
    fullyConnectedLayer(64,"Name","fc4")
    tanhLayer("Name","tanh4")
    fullyConnectedLayer(32,"Name","fc5")
    tanhLayer("Name","tanh5")
    fullyConnectedLayer(1,"Name","CriticOutput")];
cnetMCT=[
    featureInputLayer(20,"Normalization","none","Name","observation2")
    fullyConnectedLayer(256,"Name","fc11")
    tanhLayer("Name","tanh13")
    fullyConnectedLayer(64,"Name","fc14")
    tanhLayer("Name","tanh14")
    fullyConnectedLayer(32,"Name","fc15")];
cnetMCR=[
    featureInputLayer(20,"Normalization","none","Name","observation3")
    fullyConnectedLayer(256,"Name","fc21")
    tanhLayer("Name","tanh23")
    fullyConnectedLayer(64,"Name","fc24")
    tanhLayer("Name","tanh24")
    fullyConnectedLayer(32,"Name","fc25")];
criticNetwork = layerGraph(cnet);
criticNetwork = addLayers(criticNetwork, cnetMCT);
criticNetwork = connectLayers(criticNetwork,"fc15","concat/in2");
criticNetwork = addLayers(criticNetwork, cnetMCR);
criticNetwork = connectLayers(criticNetwork,"fc25","concat/in3");
criticdlnet = dlnetwork(criticNetwork,'Initialize',false);
criticdlnet1 = initialize(criticdlnet);
%Is that the problem?
critic= rlValueFunction(criticdlnet1,obsInfo, ...
    ObservationInputNames=["observation1","observation2","observation3"]);
%actor
anet = [
    featureInputLayer(9,"Normalization","none","Name","ain1")
    fullyConnectedLayer(256,"Name","fc1")
    concatenationLayer(1,3,"Name","concat")
    tanhLayer("Name","tanh1")
    fullyConnectedLayer(256,"Name","fc2")
    tanhLayer("Name","tanh2")
    fullyConnectedLayer(128,"Name","fc3")
    tanhLayer("Name","tanh3")
    fullyConnectedLayer(64,"Name","fc4")
    tanhLayer("Name","tanh4")];
anetMCT=[
    featureInputLayer(20,"Normalization","none","Name","ain2")
    fullyConnectedLayer(256,"Name","fc11")
    tanhLayer("Name","tanh13")
    fullyConnectedLayer(64,"Name","fc14")
    tanhLayer("Name","tanh14")
    fullyConnectedLayer(32,"Name","fc15")];
anetMCR=[
    featureInputLayer(20,"Normalization","none","Name","ain3")
    fullyConnectedLayer(256,"Name","fc21")
    tanhLayer("Name","tanh23")
    fullyConnectedLayer(64,"Name","fc24")
    tanhLayer("Name","tanh24")
    fullyConnectedLayer(32,"Name","fc25")];
meanPath = [
    fullyConnectedLayer(32,"Name","meanFC")
    tanhLayer("Name","tanh5")
    fullyConnectedLayer(numAct,"Name","mean")
    tanhLayer("Name","tanh6")
    scalingLayer(Name="meanPathOut",Scale=scale,Bias=bias)];
stdPath = [
    fullyConnectedLayer(32,"Name","stdFC")
    tanhLayer("Name","tanh7")
    fullyConnectedLayer(numAct,"Name","fc5")
    softplusLayer("Name","std")];
actorNetwork = layerGraph(anet);
actorNetwork = addLayers(actorNetwork,anetMCT);
actorNetwork = addLayers(actorNetwork,anetMCR);
actorNetwork = connectLayers(actorNetwork,"fc15","concat/in2");
actorNetwork = connectLayers(actorNetwork,"fc25","concat/in3");
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = connectLayers(actorNetwork,"tanh4","meanFC/in");
actorNetwork = connectLayers(actorNetwork,"tanh4","stdFC/in");
actordlnet = dlnetwork(actorNetwork);
%Is that the problem?
actor = rlContinuousGaussianActor(actordlnet,obsInfo,actInfo, ...
    "ActionMeanOutputNames","meanPathOut", ...
    "ActionStandardDeviationOutputNames","std", ...
    ObservationInputNames= ["ain1","ain2","ain3"]);
%agent
agentOptions=rlPPOAgentOptions("SampleTime",Ts,"DiscountFactor",0.995,"ExperienceHorizon",1024,"MiniBatchSize",512,"ClipFactor",0.2, ...
                               "EntropyLossWeight",0.01,"NumEpoch",8,"AdvantageEstimateMethod","gae","GAEFactor",0.98, ...
                               "NormalizedAdvantageMethod","current");
agent=rlPPOAgent(actor,critic,agentOptions);
%training
trainOptions=rlTrainingOptions("StopOnError","on", "MaxEpisodes",2000,"MaxStepsPerEpisode",floor(Tf/Ts), ...
                            "ScoreAveragingWindowLength",10,"StopTrainingCriteria","AverageReward", ...
                            "StopTrainingValue",100000,"SaveAgentCriteria","None", ...
                            "SaveAgentDirectory","D:\car\jianmo\zhangxiang\agent","Verbose",false, ...
                            "Plots","training-progress");
trainingStats = train(agent,env,trainOptions);

The debug logs are as follows

Incorrect use of rl.internal.validate.mapFunctionObservationInput

Number of input layers for deep neural network must equal to number of observation specifications.

error rlValueFunction(Line 92)

modelInputMap = rl.internal.validate.mapFunctionObservationInput(model,observationInfo,nameValueArgs.ObservationInputNames);

error ppo(Line 187)

critic= rlValueFunction(criticdlnet1,obsInfo, ...

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Ronit 2024-3-27

在 MATLAB Online 中打开

0 个投票

Hi,

Based on the error log you've provided, the issue seems to be with the number of observation inputs expected by your neural network model and the number of observation specifications you've defined. This error is thrown by the ‘rlValueFunction’ when initializing the critic, indicating that the critic's network does not match the observation information ‘obsInfo’ you've specified.

You have defined ‘obsinfo’ as a single object and while initializing the critic with ‘rlValueFunction’, you have specified three observation input names:

critic= rlValueFunction(criticdlnet1,obsInfo, ...
    ObservationInputNames=["observation1","observation2","observation3"]);

This discrepancy between the number of ‘obsInfo’ objects (1) and the number of observation input names (3) is causing of the error.

To resolve this issue, ensure that the number of ‘obsInfo’ objects matches the number of observation input names you've specified for your network. If your environment produces three distinct observations, you should define an ‘obsInfo’ object for each and pass them as a vector to the ‘rlValueFunction’.

For more information regarding ‘rlValueFunction’ function, please refer to this documentation - https://www.mathworks.com/help/reinforcement-learning/ref/rl.function.rlvaluefunction.html#responsive_offcanvas

Hope this helps!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

<Deep reinforcement learning____PPO>How do you fix this error?Ask for help

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

类别

产品

标签

Community Treasure Hunt

<Deep reinforcement learning____PPO>How do you fix this error?Ask for help

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

类别

产品

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论