In Grid World RL is it possible to update ObstacleStates and use updateStat​eTranstion​ForObstacl​es in the step function?

3 次查看(过去 30 天)
I'm trying to train reinforcement learning agen in grid world but the idea is that in every step, the robot checks its surroundings to find the obstacles. So at the first episode, the grid world would have only the start point and the goal point but no obstacles and the StateTransition matrix would be updated in each episode. For that I have created a class that constructs a GridWorld environment and in the step function I have called ObstaclesStates and updateStateTranstionForObstacles which are functions of GridWorld class. The StateTransition matrix seems to be updated but it seems that the RL agent doesn't consider it to select the action so I don't know if maybe I am updating a "copy" of the GridWorld and my idea is not possible.
classdef Copy_of_rlMDPEnv_CoppeliaSim < rl.env.MATLABEnvironment
% rlMDPEnv: Create a MATLAB based reinforcement learning environment for a
% MDP(Markov Decision Process) by supplying the MDP model.
%
% ENV = rlMDPEnv(MDP) creates a reinforcement learning environment with
% the specified MDP model. See createGridWorld and createMDP on how to
% create MDP models.
properties
% heredadas de rlMDPEnv
Model rl.env.GridWorld
ResetFcn
% necesarias para CoppeliaSim
...
end
%% Public Methods
methods
function obj = Copy_of_rlMDPEnv_CoppeliaSim(MDP)
%Copy_of_rlMDPEnv_CoppeliaSim(MDP) Construct an GridWorld environment for reinforcement learning
% MDP should be a rl.env.GridWorld
narginchk(1,1)
if ~(isa(MDP, 'rl.env.GridWorld') && isscalar(MDP))
error(message('dInput'))
end
ActionInfo = rlFiniteSetSpec(1:numel(MDP.Actions));
ActionInfo.Name = 'MDP Actions';
ObservationInfo = rlFiniteSetSpec(1:numel(MDP.States));
ObservationInfo.Name = 'MDP Observations';
% get Observation and Action information from MDP
obj = obj@rl.env.MATLABEnvironment(ObservationInfo,ActionInfo);
obj.Model = MDP;
end
end
%% Implement Abstact Methods
methods
% define step function on gridworld
function [Observation,Reward,isTerminal,Info] = step(this,Action)
% heredado de rlMDPEnv
Info = [];
Action = idx2action(this.Model,Action);
[Observation,Reward,isTerminal] = move(this.Model,Action);
Observation = state2idx(this.Model,Observation);
obstacles = updateObstacles(this); % states
this.Model.ObstacleStates = obstacles;
this.Model.updateStateTranstionForObstacles();
end
...
end
end

回答(1 个)

Shubham
Shubham 2024-6-17
Hi Andrea,
Your implementation concept for dynamically updating the grid world environment with obstacles and expecting the reinforcement learning (RL) agent to adapt accordingly is sound. However, there are a couple of nuances in reinforcement learning, especially when it comes to environment dynamics and how agents perceive changes, that might be affecting the behavior you're observing.
Points to be considered:
  1. When the environment changes (like adding obstacles), it's crucial that these changes are adequately represented in the state information that the agent receives. The agent makes decisions based on the state it perceives. If the state representation doesn't change (or doesn't change in a way that's meaningful to the agent), the agent won't "realize" that the environment has changed.
  2. When you update the StateTransition matrix within your environment, ensure that this update is effectively communicated to the RL agent. The agent uses this matrix (either directly or indirectly, depending on the algorithm) to learn the dynamics of the environment. If the agent continues to use an outdated version of this matrix, it won't adapt its policy to the new dynamics.
  3. You mentioned the possibility of updating a "copy" of the GridWorld. This is a crucial point. In object-oriented programming, especially in MATLAB, it's important to ensure that you're updating the same instance of the object that the RL agent interacts with. If your updateStateTranstionForObstacles method creates a new instance of the GridWorld or modifies a copy, the original GridWorld instance used by the RL agent remains unchanged.
Debugging Steps
  1. After updating the obstacles and the StateTransition matrix, verify that the state observed by the agent (Observation) indeed changes in a way that reflects these updates.
  2. Ensure that the this.Model you're updating within the step function is the same instance that the RL agent uses to decide on actions. You can do this by checking the object's memory address or using debugging tools to trace the execution flow.
  3. Remember that even if the environment updates correctly, the agent might need time (episodes) to learn the new dynamics. Ensure that the agent is continuously learning from new experiences and not just exploiting learned knowledge from previous, now outdated, environment dynamics.
Additional Suggestions
  1. Implement logging inside your environment to confirm that obstacles are added as expected and that the StateTransition matrix is updated accordingly.
  2. If possible, create a simple visualization of the grid world that updates with each episode. This can help you quickly identify if and when obstacles are added and how the agent's behavior changes in response.
  3. Start with a very simple scenario where the change in environment dynamics is minimal (e.g., adding a single obstacle) and easy to learn. This can help isolate whether the issue is with the dynamics update mechanism or with the agent's learning process.

类别

Help CenterFile Exchange 中查找有关 Quadratic Programming and Cone Programming 的更多信息

产品


版本

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by