How to modify actions in experiences during a reinforcement learning training

Question

Ran 2022-7-28

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1769880-how-to-modify-actions-in-experiences-during-a-reinforcement-learning-training

评论： Ran 2022-8-11

Hi experts

I am doing a reinforcement learning project using reinforcement learning. The formulated problem has a huge discrete action set. So instead of using a Deep Q learning with discrete actions, I turned to DDPG with continuous action space. What I want to do is that after each time I got an action from the actor network, I discretize it to the closest VALID discrete action. Then what I want to store in the experience is not the original continuous action, but the closest discrete action. The DDPG training in Matlab seems to store the original action generated by the actor network plus noise by default. Is there any way to MODIFY the stored action in the experience before it is pushed in the memory buffer? Thanks!

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Ran 2022-7-29

@Emmanouil Tzorakoleftherakis

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Emmanouil Tzorakoleftherakis 2022-7-29

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1769880-how-to-modify-actions-in-experiences-during-a-reinforcement-learning-training#answer_1017575

If you are working in Simulink, you can use the "Last Action" port in the RL Agent block to indicate what was the action that was actually applied to the environment.

If your environment is in MATLAB, you can either move it to Simulink with a MATLAB Fcn block and follow the above, or you can write your own custom training loop.

7 个评论
显示 5更早的评论隐藏 5更早的评论

Ran 2022-8-9

@Emmanouil Tzorakoleftherakis That makes a lot of sense. One more question that confuses me is that when calculating the observations (which I assume is the next states), reward and isdone, we need to have the current states information. But from the examples provided in Matlab, I don't see any modules that store the current states of the system. Can I use the observation input in the RL agent block or I should create some variables in Environment module to store the current states? Thanks!

Ran 2022-8-11

在 MATLAB Online 中打开

Hi @Emmanouil Tzorakoleftherakis

I have created a simulink draft as shown below.

I create a function block to discretize my action actually applied to the environment. The environment is another block on the right with output ports including NextObs, reward, and isdone. The "delay" block on the top right corner is to let the environment derive the next observations based on the previous observation. Could you please help check whether the draft makes sense or not?

Particurly, two questions confuse me:

1) As RL needs to derive next states based on the current states, how do the current states are stored in the environment block?

2) I tried to reset the initial state by doing this

function in = localResetFcn(in,N_UAV)
% Initial state: all fully charged with E_Cap, all start from ground, hr is
%
state = [2*ones(1,N_UAV),zeros(1,N_UAV),4]';  %/E_Cap*2 because of input normalization
blk = sprintf('Env_UAVChg/Environment/NextObs');
in = setBlockParameter(in,blk,'InitialCondition',num2str(state));
end

but I got an error: Outport block does not have a parameter named 'InitialCondition'. Could you please advise how to reset the states for each episode? Thanks

请先登录，再进行评论。