[Reinforcement Learning] Deny an action already taken by the agent

2 次查看(过去 30 天)
Hi all! I have a problem with the step function in the environment. I wish that an action already taken previously was not chosen,preventing the algorithm from stopping to make a mistake. In this case I am considering the removal of the nodes of a graph of numerosity 8 so the actions would be 8. I am attaching the code. Thanks to the availability :)
classdef FinderEnvi_T < rl.env.MATLABEnvironment
%MYENVIRONMENT: Template for defining custom environment in MATLAB.
%% Properties (set properties' attributes accordingly)
properties
maxdisconnectN=3;
mingcc=3;
N=8;
end
properties
% Initialize system state
State = zeros(8,2);
end
properties(Access = protected)
% Initialize internal flag to indicate episode termination
IsDone = false
end
%% Necessary Methods
methods
% Contructor method creates an instance of the environment
% Change class name and constructor name accordingly
function this = FinderEnvi_T()
% Initialize Observation settings
N=8;
ObservationInfo = rlNumericSpec([N 2]);
ObservationInfo.Name = 'Network state';
ObservationInfo.Description = 'Adj Matrix State';
% Initialize Action settings
ActionInfo = rlFiniteSetSpec([1:1:N]);
ActionInfo.Name = 'Node removal Action';
% The following line implements built-in functions of RL env
this = this@rl.env.MATLABEnvironment(ObservationInfo,ActionInfo);
end
% Reset environment to initial state and output initial observation
function InitialObservation = reset(this)
%Random graph generation
A = round(rand(this.N));
A = triu(A) + triu(A,1)';
A = A - diag(diag(A));
this.A=A;
%Node degree
[deg,indeg,outdeg]=degrees(A);
%Local clustering coefficient
[C1,C2,C] = clustCoeff(A);
InitialObservation = [deg' C];
this.State = InitialObservation;
% (optional) use notifyEnvUpdated to signal that the
% environment has been updated (e.g. to update visualization)
notifyEnvUpdated(this);
end
% Apply system dynamics and simulates the environment with the
% given action for one step.
function [Observation,Reward,IsDone,LoggedSignals] = step(this,Action)
LoggedSignals = [];
%check if the given action is valid
%Here i need help
%Node Disconnection
A=this.A;
this.State(Action,:)=zeros(1,size(this.State,2));
A(Action,:)=zeros(1,length(A));
A(:,Action)=zeros(length(A),1);
this.A=A;
%Giant connecting component
[gcc] = largestcomponent(A);
%Node degree
[deg,indeg,outdeg]=degrees(A);
%Local clustering coefficient
[C1,C2,C] = clustCoeff(A);
% Update system states
Observation=[deg' C];
Observation=this.State;
% Check terminal condition
disconnectedN= sum(Observation(:,1)==0); %conta quanti ND 0 ci sono nel vettore deg
IsDone =disconnectedN >= this.maxdisconnectN || gcc<=this.mingcc;
this.IsDone = IsDone;
% Get reward
Reward = 1/length(this.A) * gcc/length(this.A);
% (optional) Visualization method
function plot(this)
% Initiate the visualization
% Update the visualization
envUpdatedCallback(this)
end
function envUpdatedCallback(this)
end
end
end
end

回答(1 个)

Aditya
Aditya 2024-2-26
It seems you want to prevent the reinforcement learning (RL) agent from taking an action that has already been taken in the current episode, which in your case is removing a node that has already been removed from the graph. To achieve this, you will need to modify your environment to keep track of the actions taken and to provide a signal or a penalty to the agent when it attempts to take an invalid action.
Here's a modified version of the step function that includes a check for whether the node has already been removed. If the node has already been removed, it provides a negative reward and sets the IsDone flag to true to end the episode.
function [Observation, Reward, IsDone, LoggedSignals] = step(this, Action)
LoggedSignals = [];
% Check if the given action is valid
if this.State(Action,1) == 0
% The node has already been removed, give a large negative reward and end the episode
Reward = -1;
IsDone = true;
Observation = this.State;
else
% Node Disconnection
A = this.A;
this.State(Action,:) = zeros(1, size(this.State, 2));
A(Action,:) = zeros(1, length(A));
A(:,Action) = zeros(length(A), 1);
this.A = A;
% Giant connected component
[gcc] = largestcomponent(A);
% Node degree
[deg, indeg, outdeg] = degrees(A);
% Local clustering coefficient
[C1, C2, C] = clustCoeff(A);
% Update system states
Observation = [deg' C];
this.State = Observation;
% Check terminal condition
disconnectedN = sum(Observation(:,1) == 0); % Count how many nodes have degree 0
IsDone = disconnectedN >= this.maxdisconnectN || gcc <= this.mingcc;
this.IsDone = IsDone;
% Get reward
Reward = 1 / length(this.A) * gcc / length(this.A);
end
% environment has been updated (e.g. to update visualization)
notifyEnvUpdated(this);
end
In this modification, before proceeding with removing a node, the step function checks if the node has already been removed by looking at the State property. If the degree of the node (first column in State) is zero, it means the node has already been removed, and the function provides a negative reward and ends the episode.
Please note that ending the episode might not be the best strategy for training an RL agent, as it could lead to the agent learning to avoid the penalty by not taking any action at all. A better approach might be to provide a negative reward but allow the episode to continue, or to implement a masking mechanism that only presents the agent with valid actions at each step.

类别

Help CenterFile Exchange 中查找有关 Training and Simulation 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by