How to set initial estimate for mean in PPO actor critic network

Question

Jason Butler 2024-5-8

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2116811-how-to-set-initial-estimate-for-mean-in-ppo-actor-critic-network

回答： Aneela 2024-5-22

I am using a PPO actor critic network. I created the actor following this example

https://www.mathworks.com/help/reinforcement-learning/ref/rl.function.rlcontinuousgaussianactor.html

How can I set an initial guess for the mean? Currently the actor always starts with an intial mean at time zero of zero.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Aneela 2024-5-22

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2116811-how-to-set-initial-estimate-for-mean-in-ppo-actor-critic-network#answer_1461451

在 MATLAB Online 中打开

Hi Jason Butler,

To set an initial guess for the mean in PPO actor network, modify the initial weights or biases of the layers that contribute to calculating the mean.

Set an initial guess for the mean in the bias of the “fullyConnectedLayer” in the mean calculation.
However, because of the non-linearities like the “tanhLayer”, directly setting the bias to achieve a specific mean after scaling and non-linear transformations can be complex.

Assuming the desired initial mean as 5, here’s a workaround:

desiredInitialMean = 5; % Adjust this value as needed
% Since you have 3 actions, create a bias vector with 3 elements
biasForDesiredMean = repmat(desiredInitialMean / actInfo.UpperLimit, [prod(actInfo.Dimension), 1]);
% Modify the meanPath definition to include the bias initialization as a vector
meanPath = [ 
    tanhLayer(Name="tanhMean");
    fullyConnectedLayer(prod(actInfo.Dimension), ...
    'Bias', biasForDesiredMean, ... 
    Name="fcMean");
    scalingLayer(Name="scale", ...
    'Scale', actInfo.UpperLimit)
];

For more information on “Bias” in the “fullyConnectedLayer”, refer to the following MathWorks documentation: https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.fullyconnectedlayer.html?s_tid=doc_ta#:~:text=Layer%20biases%2C%20specified,single%20%7C%20double

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

How to set initial estimate for mean in PPO actor critic network

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

How to set initial estimate for mean in PPO actor critic network

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论