Reinforcement learning agent for mixed action space.

11 次查看(过去 30 天)
Context:
I have an environment with following actions
  1. Discrete set of actions : [A; B] (can have logical values, 1 or 0)
  2. Continues set of actions : [x1; x2; x3; x4; x5; x6]
x1, x3 and x5 are dependent on A, x2, x4, and x6 are dependent on B. The observations are contnues.
obsInfo=rlNumericSpec([4,1],"LowerLimit",[0;0;2000;24],"UpperLimit",[1;1;7500;27]);
actInfo_1=rlFiniteSetSpec({[1;0],[0;1],[1;1]});
actInfo_2=rlNumericSpec([6,1],"LowerLimit",[108.8;108.8;136.6;136.6;26;26],...
"UpperLimit", [181.3;181.3;227;227;34;34]);
Question:
  1. How to define simulink environment?
  2. Which agent is suited for the above given environment?
  3. Is it possible to have multiple action paths for an agent?
  4. Does the dependancy of [A, B] need to be defined in the environment or somewhere else (actor or critic net)?
I am new to deep learning. Detailed answers with easly understood explanation (for a novice) are appreciated.

回答(2 个)

Aiswarya
Aiswarya 2023-9-8
Hi,
The answers to your queries are as follows:
1.How to define simulink environment?
To define a Simulink environment, you can use the ‘rlSimulinkEnv’ function provided by the Reinforcement Learning Toolbox in MATLAB. https://www.mathworks.com/help/reinforcement-learning/ref/rlsimulinkenv.html
This function allows you to specify the Simulink model, agent, observation and action specifications.
2. Which agent is suited for the above given environment?
You can start with a simple algorithm and then try progressively more complicated ones if they don't perform as desired. The built-in RL agents don't support mixed-action space.
As a workaround you can try multi-agent training, i.e. use one agent for continuous actions and another agent for discrete actions. You can use a combination of DQN (Deep Q-Network) agent (for discrete action space) and DDPG(Deep Deterministic Policy Gradient) agent (for continuous action space) as these are the simplest compatible agents. Refer to this example which uses multiple agents, to get more information:
3. Is it possible to have multiple action paths for an agent?
Yes, we can have multiple action signals for one agent. For your case, it's better to have different agents for the different action spaces as shown in the above example.
4. Does the dependency of [A, B] need to be defined in the environment or somewhere else (actor or critic net)?
The dependency of [A, B] could be defined in the environment itself. The environment can take the actions [A; B] as inputs and generate the corresponding dependent actions [x1, x3, x5] and [x2, x4, x6] internally.

Emmanouil Tzorakoleftherakis
Reinforcement Learning Toolbox does not support agents with both continuous and discrete actions. Can you share some more details on the application?
As an alternative you could consider training two agents, one for the continuous action and another for the discrete. Another option would be to set up an off-policy agent for continuous actions (e.g. DDPG) and quantize the discrete actions on the environment side. This approach is a little tricky because you would also need to make sure to use, e.g., the 'last action' signal to update the experience buffer accordingly.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by