Different Action spaces in different steps

Question

Danial Kazemikia 2024-7-3

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2134001-different-action-spaces-in-different-steps

回答： Shantanu Dixit 2024-7-12

In matlab RL, is it possible that the agent have one type of action space in the first step but another action space after that? for example in a grid world, in the first step of each episode, the action be choosing where to start on the grid, but in the next steps, choosing where to go from that point?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Shantanu Dixit 2024-7-12

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2134001-different-action-spaces-in-different-steps#answer_1484818

在 MATLAB Online 中打开

Hi Danial,

It is my understanding that you want the agent to have one type of action space for the first step in an episode and then possibly a different action space based on the policy.

Considering the example of grid-based world, the agent can choose a starting action space using ‘rlFiniteSetSpec’ which can be mapped to a position in the grid, once the first step is taken the action space can be changed to follow a policy to take further actions during the episode.

You can refer to the below code for reference, this follows a custom class implementation and a sample policy function for simulation.

classdef CustomGridWorld < rl.env.MATLABEnvironment
    properties
        GridSize = [5, 5];                       % Size of the grid
        CurrentState = [1, 1];                   % Current position in the grid
        TerminalState = [5, 5];                  % Goal position in the grid
        Obstacles = [3, 3; 3, 4; 3, 5; 4, 3];    % Positions of obstacles in the grid
        IsFirstStep = true;                      % Flag to indicate the first step
    end
    
    methods
        function this = CustomGridWorld()
            % Define the observation and action spaces
            ObservationInfo = rlNumericSpec([2 1]);
            ObservationInfo.Name = 'Grid State';
            ActionInfo = rlFiniteSetSpec(1:25); % Initial action space (choosing start position)
            ActionInfo.Name = 'Grid Action';
            this = this@rl.env.MATLABEnvironment(ObservationInfo, ActionInfo);
        end
        
        function [Observation, Reward, IsDone, LoggedSignals] = step(this, Action)
            LoggedSignals = []; % Initialize LoggedSignals
            
            if this.IsFirstStep
                % Decode the action to a starting position
                disp('Taking the first step!')
                [row, col] = ind2sub(this.GridSize, Action);
                this.CurrentState = [row, col];
                this.IsFirstStep = false;
                % Change action space to standard grid actions (up, down, left, right)
                this.ActionInfo = rlFiniteSetSpec([1, 2, 3, 4]);
                this.ActionInfo.Name = 'Grid Action';
            else
                % Standard grid movement logic
                nextState = this.CurrentState;
                switch Action
                    case 1 % Up
                        nextState = this.CurrentState + [-1, 0];
                    case 2 % Down
                        nextState = this.CurrentState + [1, 0];
                    case 3 % Left
                        nextState = this.CurrentState + [0, -1];
                    case 4 % Right
                        nextState = this.CurrentState + [0, 1];
                end
                
                % Check if next state is within bounds and not an obstacle
                if all(nextState > 0) && all(nextState <= this.GridSize) && ...
                   ~ismember(nextState, this.Obstacles, 'rows')
                    this.CurrentState = nextState;
                else
                    disp('At an obstacle, take another action!')
                end
            end
            
            % Set Observation, Reward, IsDone, and LoggedSignals
            Observation = this.CurrentState';
            if isequal(this.CurrentState, this.TerminalState)
                Reward = 10; % Reward for reaching the terminal state
                IsDone = true;
            else
                Reward = -1; % Small negative reward for each step
                IsDone = false;
            end
        end
        
        function InitialObservation = reset(this)
            % Reset the environment to the initial state
            this.CurrentState = [1, 1];
            this.IsFirstStep = true;
            % Reset action space to initial action space
            this.ActionInfo = rlFiniteSetSpec(1:25);
            InitialObservation = this.CurrentState';
        end
        function actionInfo = getActionInfo(this)
            % Method to get the current action space information
            actionInfo = this.ActionInfo;
        end
    end
end
% Define a simple policy function
function action = simplePolicy(state, terminalState, actions)
    if state(2) < terminalState(2)
        action = actions(4); % Move Right
    elseif state(1) < terminalState(1)
        action = actions(2); % Move Down
    else
        action = actions(randi(length(actions))); % Random action if already at the terminal state
    end
end
% Create an instance of the CustomGridWorld environment
env = CustomGridWorld();
% Reset the environment to the initial state
initialObservation = env.reset()
numEpisodes = 2;  % user defined
numSteps = 8;     % user defined
for episode = 1:numEpisodes
    % Reset the environment at the start of each episode
    initialObservation = env.reset();
    disp('Starting a new episode')
    disp(['Episode ', num2str(episode), ' started']);
    disp(['Initial State: ', mat2str(initialObservation)]);
    
    for step = 1:numSteps
        % Get the current action space
        actionInfo = env.getActionInfo();
        actions = actionInfo.Elements;
        if env.IsFirstStep
            disp('Taking a random action for the first step');
            action = actions(randi(length(actions)));
            
        else
            % Get the current state
            currentState = env.CurrentState;
            
            % Select an action using the policy
            action = simplePolicy(currentState, env.TerminalState, actions);
        
        end
        
        % Take the action and get the next observation, reward, and done flag
        [observation, reward, isDone, loggedSignals] = env.step(action);
        
        % Display the results of the step
        disp(['Step ', num2str(step)]);
        disp(['Action: ', num2str(action)]);
        disp(['State: ', mat2str(observation)]);
        disp(['Reward: ', num2str(reward)]);
        disp(['IsDone: ', num2str(isDone)]);
        
        % If the episode is done, break the loop
        if isDone
            disp('Episode finished.');
            disp('-----------------------');
            break;
        end
    end
end
% Display final observation if the episode didn't finish early
if ~isDone
    disp('Final Observation after 5 steps:');
    disp(observation);
end

First Step: The action space consists of choosing a starting position on the grid. The initial action space is set to 25 possible actions, corresponding to selecting a starting position on a 5x5 grid using ‘ind2sub’.

Subsequent Steps: The action space consists of choosing a direction (up, down, left, right) to move from the current position.

The ‘CustomGridWorld’ class is designed to handle different action spaces based on whether it is the first step or a subsequent step.

The ‘IsFirstStep’ property is used to check if the current step is the first step of the episode.

The ‘getActionInfo’ returns the agent’ current possible choices using the elements property of ‘rlFiniteSetSpec’ for taking an action. The action space changes once the first step is taken by the agent by setting this.ActionInfo to a new set of actions [1, 2, 3, 4] (up, down, left, right).

I hope this provides an idea about how to change the Action Space in grid-based setting.

To learn about custom class implementation and other functions refer to the following MathWorks documentation:

Thanks.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Different Action spaces in different steps

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

Different Action spaces in different steps

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论