我在matlab中使用强化学习进行控制在actor的网络最后一层用的是tanhlayer,那么输出的范围应该在-1到1，但是输出的大小却不是

Question

guiyang 2024-6-5

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2125761-matlab-actor-tanhlayer-1-1

评论： guiyang 2024-6-13

actorNet = [

featureInputLayer(numObs, Name="StateInLyr")

fullyConnectedLayer(64)

reluLayer

fullyConnectedLayer(32)

reluLayer

fullyConnectedLayer(numAct)

tanhLayer(Name="ActionOutLyr")

];

图片是actoer的输出，一共是6维，每个维度的输出都不在这个范围

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Krishna 2024-6-6

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2125761-matlab-actor-tanhlayer-1-1#answer_1468311

Hi Guiyang,

If the output of a tanh layer in your network is not within the expected range of -1 to 1, consider the following points:

Minor deviations from the expected range might be due to floating-point precision limits. These are typically negligible.
Check if there's any scaling or modification applied after the tanh output that might alter its range.
Ensure that the tanh layer is indeed the final layer in your network, with no additional operations post-tanh.
Verify that the method used for logging or visualizing outputs is accurate and not introducing errors or not scaling the tanh output.

Also you can follow these troubleshooting Steps:

Test the tanh function with known inputs to confirm its correct behavior.
Double-check the network architecture for unintended layers or operations after the tanh.

These steps should help identify and resolve the issue with the tanh layer output.

Also please follow this documentation to ask question better and get quick answers,

https://in.mathworks.com/matlabcentral/answers/6200-tutorial-how-to-ask-a-question-on-answers-and-get-a-fast-answer

Hope this helps.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

guiyang 2024-6-13

我还是找不到错误的原因，我给出了我的代码，能帮忙检查下吗

clc

clear

%%

%参数设置

dataType = 'double';

%%

%模型参数

Ts=1e-5;

T=0.001;

T1=0;

w=2*pi*50;

Un=1770;

Rn=0.145;

Ln=5.4e-3;

Cd=9e-3;

Udc=3600;

Rd=25;

Kesogi=1;

Kisogi=1;

Kppll=0.7;

Kipll=25;

Kpv=0.5;

Kiv=5;

Kpi=2;

Kii=50;

fp=1:0.01:85;

%%

%创建环境接口

mdl = "deepl_rectifier_model1";

open_system(mdl)

numObs = 10;

obsInfo = rlNumericSpec( ...

[numObs 1], ...

DataType=dataType);

obsInfo.Name = "observations";

obsInfo.Description = "Error and reference signal";

% 创建动作规范

numAct = 6;

actInfo = rlNumericSpec([numAct 1], "DataType", dataType);

actInfo.Name = "vqdRef";

agentblk = "deepl_rectifier_model1/RL Agent";

env = rlSimulinkEnv(mdl, agentblk, obsInfo, actInfo);

actInfo=getActionInfo(env);

env.ResetFcn = @resetReCT;

%%

%建立智能体

% 状态输入路径

statePath = [

featureInputLayer(numObs, Name="StateInLyr")

fullyConnectedLayer(64, Name="fc1")

];

% 动作输入路径

actionPath = [

featureInputLayer(numAct, Name="ActionInLyr")

fullyConnectedLayer(64, Name="fc2")

];

% 通用输出路径

commonPath = [additionLayer(2, Name="add")

reluLayer

fullyConnectedLayer(32)

reluLayer

fullyConnectedLayer(16)

fullyConnectedLayer(1, Name="QValueOutLyr")

];

% 将图层添加到图层图对象

criticNet = layerGraph();

criticNet = addLayers(criticNet, statePath);

criticNet = addLayers(criticNet, actionPath);

criticNet = addLayers(criticNet, commonPath);

% 连接图层

criticNet = connectLayers(criticNet, "fc1", "add/in1");

criticNet = connectLayers(criticNet, "fc2", "add/in2");

%绘制critic

criticDLNet = dlnetwork(criticNet, Initialize=false);

%固定随机种子

rng(0)

%建立critic

critic1 = rlQValueFunction(initialize(criticDLNet),obsInfo, actInfo);

critic2 = rlQValueFunction(initialize(criticDLNet),obsInfo, actInfo);

%建立actor

actorNet = [

featureInputLayer(numObs, Name="StateInLyr")

fullyConnectedLayer(64)

reluLayer

fullyConnectedLayer(32)

reluLayer

fullyConnectedLayer(numAct)

sigmoidLayer(Name="ActionOutLyr")

];

%绘制actoer

actordlNet = dlnetwork(actorNet);

% summary(actordlNet)

% plot(actordlNet)

%构建actor

actor = rlContinuousDeterministicActor(actordlNet,obsInfo,actInfo);

%设置智能体参数

Ts_agent = 0.001;

agentOpts = rlTD3AgentOptions( ...

SampleTime=Ts_agent, ...

DiscountFactor=0.995, ...

ExperienceBufferLength=2e6, ...

MiniBatchSize=256, ...

NumStepsToLookAhead=1, ...

TargetSmoothFactor=0.005, ...

TargetUpdateFrequency=10);

for idx = 1:2

agentOpts.CriticOptimizerOptions(idx).LearnRate = 1e-4;

agentOpts.CriticOptimizerOptions(idx).GradientThreshold = 1;

agentOpts.CriticOptimizerOptions(idx).L2RegularizationFactor = 1e-3;

end

% Actor optimizer options

agentOpts.ActorOptimizerOptions.LearnRate = 1e-3;

agentOpts.ActorOptimizerOptions.GradientThreshold = 1;

agentOpts.ActorOptimizerOptions.L2RegularizationFactor = 1e-3;

%设置噪声参数

%设置噪声的方差和衰减率

agentOpts.ExplorationModel.Variance = 0.05;

agentOpts.ExplorationModel.VarianceDecayRate = 2e-4;

agentOpts.ExplorationModel.VarianceMin = 0.001;

%高斯动作噪声模型来平滑目标策略更新

agentOpts.TargetPolicySmoothModel.Variance = 0.1;

agentOpts.TargetPolicySmoothModel.VarianceDecayRate = 1e-4;

% 使用指定的参与者、批评者和选项创建代理

agent = rlTD3Agent(actor, [critic1,critic2], agentOpts);

%%

%训练的智能体

T2 = 2;

maxepisodes = 1000;

maxsteps = ceil(T2/Ts_agent);

trainOpts = rlTrainingOptions(...

MaxEpisodes=maxepisodes, ...

MaxStepsPerEpisode=maxsteps, ...

StopTrainingCriteria="AverageReward",...

StopTrainingValue=-190,...

ScoreAveragingWindowLength=100);

doTraining = true;

if doTraining

trainResult = train(agent, env, trainOpts);

else

load("rlPMSMAgent.mat","agent")

end

%%

%智能体仿真

sim(mdl);

请先登录，再进行评论。

我在matlab中使用强化学习进行控制在actor的网络最后一层用的是tanhlayer,那么输出的范围应该在-1到1，但是输出的大小却不是

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

我在matlab中使​用强化学习进行控制在​actor的网络最后​一层用的是tanhl​ayer,那么输出的​范围应该在-1到1，​但是输出的大小却不是

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

我在matlab中使用强化学习进行控制在actor的网络最后一层用的是tanhlayer,那么输出的范围应该在-1到1，但是输出的大小却不是

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论