beta distribution in PPO

2 次查看（过去 30 天）

Sourabh 2024-2-2

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2077451-beta-distribution-in-ppo

评论： Kautuk Raj 2024-2-15

I want to confine the actions of my PPO algorithm and I was thinking whether or not I can implement beta distribution for my PPO algorithm to confine my action space somehow.

heres the script of networks i am using

----------

commonPath = [

featureInputLayer(prod(obsInfo.Dimension),Name="comPathIn")

fullyConnectedLayer(120)

tanhLayer

fullyConnectedLayer(1,Name="comPathOut")

];

% Define mean value path

meanPath = [

fullyConnectedLayer(64,Name="meanPathIn")

tanhLayer

fullyConnectedLayer(64,Name="fc_2")

tanhLayer

fullyConnectedLayer(prod(actInfo.Dimension))

leakyReluLayer(0.1,Name="meanPathOut")

];

% Define standard deviation path

sdevPath = [

fullyConnectedLayer(64,"Name","stdPathIn")

tanhLayer

fullyConnectedLayer(64)

tanhLayer

fullyConnectedLayer(prod(actInfo.Dimension));

softmaxLayer(Name="stdPathOut")

];

% Add layers to layerGraph object

actorNet = layerGraph(commonPath);

actorNet = addLayers(actorNet,meanPath);

actorNet = addLayers(actorNet,sdevPath);

% Connect paths

actorNet = connectLayers(actorNet,"comPathOut","meanPathIn/in");

actorNet = connectLayers(actorNet,"comPathOut","stdPathIn/in");

actorNetwork = dlnetwork(actorNet);

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Kautuk Raj 2024-2-15

To implement a Beta distribution for the action outputs in the PPO algorithm, I think we would need to modify the network architecture to output the parameters (alpha and beta) of the Beta distribution. These parameters must be positive, so one would typically use an activation function that ensures positivity, such as the softplus function.

请先登录，再进行评论。

请先登录，再回答此问题。

回答（0 个）

请先登录，再回答此问题。

类别

AI and Statistics Deep Learning Toolbox

在 Help Center 和 File Exchange 中查找有关 Deep Learning Toolbox 的更多信息

产品

版本

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

beta distribution in PPO

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

beta distribution in PPO

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论