How to input action in reinforcement learning template environment?

4 次查看(过去 30 天)
I have modified the template environment to adapt my scenarios. My current action cosists of two vectors. The Action configuration is like the following.
function this = EdgeEnvironment()
% Initialize Observation settings
ObservationInfo(1) = rlNumericSpec([1 10]);
ObservationInfo(1).Name = 'schedule';
ObservationInfo(1).Description = 'schedule';
ObservationInfo(2) = rlNumericSpec([1 20]);
ObservationInfo(2).Name = 'ppath';
ObservationInfo(2).Description = 'ppath';
ObservationInfo(3) = rlNumericSpec([1 1]);
ObservationInfo(3).Name = 'completionTime';
ObservationInfo(3).Description = 'completionTime';
ObservationInfo(4) = rlNumericSpec([1 1]);
ObservationInfo(4).Name = 'computeDuring';
ObservationInfo(4).Description = 'computeDuring';
% Initialize Action settings
ActionInfo(1) = rlNumericSpec([1 10]);
ActionInfo(1).Name = 'schedule';
ActionInfo(2) = rlNumericSpec([1 20]);
ActionInfo(2).Name = 'ppath';
% The following line implements built-in functions of RL env
this = this@rl.env.MATLABEnvironment(ObservationInfo, ActionInfo);
end
The step function was designed like the following.
function [Observation,Reward,IsDone,LoggedSignals] = step(this, Action)
LoggedSignals = [];
% distance
node_distance = zeros(this.device_count, this.device_count);
distance = getDistance(this, node_distance);
% parameter list
parameter_list = getstruct(this, distance);
% the parameter list of device
device_list = get_device_list(this);
% Extract action
[schedule_act, ppath_act]=get_act(Action);
% schedule_act = Action{1,1};
% ppath_act = Action{1,2};
% Unpack state vector
last_schedule = schedule_act;
last_ppath = ppath_act;
last_completionTime = this.State{1,3};
last_computeDuring = this.State{1,4};
% Update system states
[schedule, stay_node_list, completionTime] = ComScheduling(last_completionTime,...
last_schedule, last_ppath, device_list, parameter_list);
[ppath, stay_node_list, completionTime, computeDuring] = PathPlanning(last_completionTime,...
last_ppath, schedule, stay_node_list, device_list, parameter_list);
prob = 1 / (1 + exp((completionTime - last_completionTime)/parameter_list.omega));
dice = rand(1);
if dice <= prob
last_ppath = ppath;
last_schedule = schedule;
last_stay_node_list = stay_node_list;
last_completionTime = completionTime;
last_computeDuring = computeDuring;
completionTime_iter(end + 1) = completionTime;
else
completionTimer_iter(end + 1) = last_computeDuring;
end
ppath = last_ppath;
schedule = last_schedule;
stay_node_list = last_stay_node_list;
completionTime = last_completionTime;
computeDuring = last_computeDuring;
Observation = {schedule, ppath, completionTime, computeDuring};
this.State = Observation;
% Check terminal condition
completionTime = Observation(3);
computeDuring = Observation(4);
IsDone = completionTime < this.completionTime_threshold || computeDuring < this.computeDuring_threshold;
this.IsDone = IsDone;
% Get reward
Reward = -completionTime;
end
We caculate the action value by the following function.
function [schedule_act, ppath_act] = get_act(action)
schedule_act = action{1,1};
ppath_act = action{1,2};
end
When I run the validateEnvironment function, the error is like the following.
I want to know how to fix them.

采纳的回答

Emmanouil Tzorakoleftherakis
Easiest thing you can do is add a break point and display what "action" variable is. It's obviously not a cell array so you cannot access is with braces {} in the "get_act" function. That's why you are getting the error
  8 个评论
Yang Chen
Yang Chen 2023-3-9
It is about the size of my discrete action space. For example, my action space is like {[1, 2, 3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]}, which follows all random order of 1-3. When we increase the amount of number to 20, the amount of data size is over the system limitation.
Emmanouil Tzorakoleftherakis
Thanks for clarifying. This is the curse of dimensionality, not much you can do about that other than using a continuous action space unfortunately.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Graph and Network Algorithms 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by