In Grid World RL is it possible to update ObstacleStates and use updateStateTranstionForObstacles in the step function?

Question

Andrea Fernandez Fernandez 2024-5-30

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2124156-in-grid-world-rl-is-it-possible-to-update-obstaclestates-and-use-updatestatetranstionforobstacles-in

回答： Shubham 2024-6-17

I'm trying to train reinforcement learning agen in grid world but the idea is that in every step, the robot checks its surroundings to find the obstacles. So at the first episode, the grid world would have only the start point and the goal point but no obstacles and the StateTransition matrix would be updated in each episode. For that I have created a class that constructs a GridWorld environment and in the step function I have called ObstaclesStates and updateStateTranstionForObstacles which are functions of GridWorld class. The StateTransition matrix seems to be updated but it seems that the RL agent doesn't consider it to select the action so I don't know if maybe I am updating a "copy" of the GridWorld and my idea is not possible.

classdef Copy_of_rlMDPEnv_CoppeliaSim < rl.env.MATLABEnvironment

% rlMDPEnv: Create a MATLAB based reinforcement learning environment for a

% MDP(Markov Decision Process) by supplying the MDP model.

%

% ENV = rlMDPEnv(MDP) creates a reinforcement learning environment with

% the specified MDP model. See createGridWorld and createMDP on how to

% create MDP models.

properties

% heredadas de rlMDPEnv

Model rl.env.GridWorld

ResetFcn

% necesarias para CoppeliaSim

...

end

%% Public Methods

methods

function obj = Copy_of_rlMDPEnv_CoppeliaSim(MDP)

%Copy_of_rlMDPEnv_CoppeliaSim(MDP) Construct an GridWorld environment for reinforcement learning

% MDP should be a rl.env.GridWorld

narginchk(1,1)

if ~(isa(MDP, 'rl.env.GridWorld') && isscalar(MDP))

error(message('dInput'))

end

ActionInfo = rlFiniteSetSpec(1:numel(MDP.Actions));

ActionInfo.Name = 'MDP Actions';

ObservationInfo = rlFiniteSetSpec(1:numel(MDP.States));

ObservationInfo.Name = 'MDP Observations';

% get Observation and Action information from MDP

obj = obj@rl.env.MATLABEnvironment(ObservationInfo,ActionInfo);

obj.Model = MDP;

end

%% Implement Abstact Methods

methods

% define step function on gridworld

function [Observation,Reward,isTerminal,Info] = step(this,Action)

% heredado de rlMDPEnv

Info = [];

Action = idx2action(this.Model,Action);

[Observation,Reward,isTerminal] = move(this.Model,Action);

Observation = state2idx(this.Model,Observation);

obstacles = updateObstacles(this); % states

this.Model.ObstacleStates = obstacles;

this.Model.updateStateTranstionForObstacles();

end

...

end

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Shubham 2024-6-17

Hi Andrea,

Your implementation concept for dynamically updating the grid world environment with obstacles and expecting the reinforcement learning (RL) agent to adapt accordingly is sound. However, there are a couple of nuances in reinforcement learning, especially when it comes to environment dynamics and how agents perceive changes, that might be affecting the behavior you're observing.

Points to be considered:

When the environment changes (like adding obstacles), it's crucial that these changes are adequately represented in the state information that the agent receives. The agent makes decisions based on the state it perceives. If the state representation doesn't change (or doesn't change in a way that's meaningful to the agent), the agent won't "realize" that the environment has changed.
When you update the StateTransition matrix within your environment, ensure that this update is effectively communicated to the RL agent. The agent uses this matrix (either directly or indirectly, depending on the algorithm) to learn the dynamics of the environment. If the agent continues to use an outdated version of this matrix, it won't adapt its policy to the new dynamics.
You mentioned the possibility of updating a "copy" of the GridWorld. This is a crucial point. In object-oriented programming, especially in MATLAB, it's important to ensure that you're updating the same instance of the object that the RL agent interacts with. If your updateStateTranstionForObstacles method creates a new instance of the GridWorld or modifies a copy, the original GridWorld instance used by the RL agent remains unchanged.

Debugging Steps

After updating the obstacles and the StateTransition matrix, verify that the state observed by the agent (Observation) indeed changes in a way that reflects these updates.
Ensure that the this.Model you're updating within the step function is the same instance that the RL agent uses to decide on actions. You can do this by checking the object's memory address or using debugging tools to trace the execution flow.
Remember that even if the environment updates correctly, the agent might need time (episodes) to learn the new dynamics. Ensure that the agent is continuously learning from new experiences and not just exploiting learned knowledge from previous, now outdated, environment dynamics.

Additional Suggestions

Implement logging inside your environment to confirm that obstacles are added as expected and that the StateTransition matrix is updated accordingly.
If possible, create a simple visualization of the grid world that updates with each episode. This can help you quickly identify if and when obstacles are added and how the agent's behavior changes in response.
Start with a very simple scenario where the change in environment dynamics is minimal (e.g., adding a single obstacle) and easy to learn. This can help isolate whether the issue is with the dynamics update mechanism or with the agent's learning process.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

In Grid World RL is it possible to update ObstacleStates and use updateStateTranstionForObstacles in the step function?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

In Grid World RL is it possible to update ObstacleStates and use updateStat​eTranstion​ForObstacl​es in the step function?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

In Grid World RL is it possible to update ObstacleStates and use updateStateTranstionForObstacles in the step function?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论