How to get the policy function?

Question

ryunosuke tazawa 2022-6-5

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1734030-how-to-get-the-policy-function

回答： Vidhi Agarwal 2024-11-4，4:12

I did simulation with pendulum in Reinforcement learning.

After that, I would like to find a policy function of the post -learning controller.

In this case, the policy function will be a torque (control function) that outputs the controller to the state (angle and angle speed).

In addition, I want to tailor the state (angle and angular speed) like Qtable.

In this case, which one use GeneratePolicyFunction or Getaction?

Is this method correct? Or is there another way?　Also, how can I save the network by using Sac (Soft-Acto-Critic)?

clear all;
close all;
%% load 'agent.mat'
load('k5_simplePendulum.mat','agent');
generatePolicyFunction(agent);
%% tiring states(angler velocity , angle)
N = 5; %5 divisions
NN = N*N;
Angle = linspace(-3.14,-4.71,N);
Velocity = linspace(0,-20,N);
State = comvec(Angle,Velocity); % Combination of state, number of tiles 5×5
F = zeros(NN,1);　　% Policy function (Torque predicted by trainned agent?)
for i=1:NN
    F(:,i) = evaluatePolicy(State(:,1));
end

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Vidhi Agarwal 2024-11-4，4:12

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1734030-how-to-get-the-policy-function#answer_1540410

在 MATLAB Online 中打开

Hi @ryunosuke tazawa,

To find the policy function of your post-learning controller using reinforcement learning, you can try use the trained agent to evaluate actions based on given states.

generatePolicyFunction: This function is typically used to generate a standalone policy function from a trained reinforcement learning agent. This function can be useful if you want to deploy the policy outside of the reinforcement learning environment or integrate it into a larger system.
getAction: This method is used to obtain the action from the agent given a specific state. It is more straightforward for evaluating the policy in a simulation or analysis context.

For your purpose of evaluating the policy function (torque) for specific states (angle and angular speed), using getAction is more appropriate. It allows you to directly query the agent for actions based on the states you specify.

For better understanding of these function, you can refer to the below documentation:

If you are using the "Soft Actor-Critic" (SAC) algorithm, the agent consists of both actor and critic networks. You can save the trained agent using the save function in MATLAB, which will include these networks. This saves the entire agent, including its policy (actor network) and value function (critic network).

Revised code for the same is given below:

% Load the trained agent
% Define the state space (angle and angular velocity)
N = 5; % Number of divisions
Angle = linspace(-3.14, -4.71, N);
Velocity = linspace(0, -20, N);
[AngleGrid, VelocityGrid] = meshgrid(Angle, Velocity);
State = [AngleGrid(:), VelocityGrid(:)]; % Combination of states
% Preallocate the policy function output
F = zeros(size(State, 1), 1); % Policy function (Torque predicted by trained agent)
% Evaluate the policy for each state
for i = 1:size(State, 1)
    F(i) = getAction(agent, State(i, :));
end
% Save the trained agent
save('trainedAgent.mat', 'agent');

For better understanding of "SAC" algortihm, refer to the following documentation.

https://www.mathworks.com/help/reinforcement-learning/ref/rl.agent.rlsacagent.html

Hope that helps!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

How to get the policy function?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

How to get the policy function?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论