How to input action in reinforcement learning template environment?

Question

Yang Chen 2023-3-7

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1924530-how-to-input-action-in-reinforcement-learning-template-environment

评论： Emmanouil Tzorakoleftherakis 2023-3-9

I have modified the template environment to adapt my scenarios. My current action cosists of two vectors. The Action configuration is like the following.

function this = EdgeEnvironment()

% Initialize Observation settings

ObservationInfo(1) = rlNumericSpec([1 10]);

ObservationInfo(1).Name = 'schedule';

ObservationInfo(1).Description = 'schedule';

ObservationInfo(2) = rlNumericSpec([1 20]);

ObservationInfo(2).Name = 'ppath';

ObservationInfo(2).Description = 'ppath';

ObservationInfo(3) = rlNumericSpec([1 1]);

ObservationInfo(3).Name = 'completionTime';

ObservationInfo(3).Description = 'completionTime';

ObservationInfo(4) = rlNumericSpec([1 1]);

ObservationInfo(4).Name = 'computeDuring';

ObservationInfo(4).Description = 'computeDuring';

% Initialize Action settings

ActionInfo(1) = rlNumericSpec([1 10]);

ActionInfo(1).Name = 'schedule';

ActionInfo(2) = rlNumericSpec([1 20]);

ActionInfo(2).Name = 'ppath';

% The following line implements built-in functions of RL env

this = this@rl.env.MATLABEnvironment(ObservationInfo, ActionInfo);

end

The step function was designed like the following.

function [Observation,Reward,IsDone,LoggedSignals] = step(this, Action)

LoggedSignals = [];

% distance

node_distance = zeros(this.device_count, this.device_count);

distance = getDistance(this, node_distance);

% parameter list

parameter_list = getstruct(this, distance);

% the parameter list of device

device_list = get_device_list(this);

% Extract action

[schedule_act, ppath_act]=get_act(Action);

% schedule_act = Action{1,1};

% ppath_act = Action{1,2};

% Unpack state vector

last_schedule = schedule_act;

last_ppath = ppath_act;

last_completionTime = this.State{1,3};

last_computeDuring = this.State{1,4};

% Update system states

[schedule, stay_node_list, completionTime] = ComScheduling(last_completionTime,...

last_schedule, last_ppath, device_list, parameter_list);

[ppath, stay_node_list, completionTime, computeDuring] = PathPlanning(last_completionTime,...

last_ppath, schedule, stay_node_list, device_list, parameter_list);

prob = 1 / (1 + exp((completionTime - last_completionTime)/parameter_list.omega));

dice = rand(1);

if dice <= prob

last_ppath = ppath;

last_schedule = schedule;

last_stay_node_list = stay_node_list;

last_completionTime = completionTime;

last_computeDuring = computeDuring;

completionTime_iter(end + 1) = completionTime;

else

completionTimer_iter(end + 1) = last_computeDuring;

end

ppath = last_ppath;

schedule = last_schedule;

stay_node_list = last_stay_node_list;

completionTime = last_completionTime;

computeDuring = last_computeDuring;

Observation = {schedule, ppath, completionTime, computeDuring};

this.State = Observation;

% Check terminal condition

completionTime = Observation(3);

computeDuring = Observation(4);

IsDone = completionTime < this.completionTime_threshold || computeDuring < this.computeDuring_threshold;

this.IsDone = IsDone;

% Get reward

Reward = -completionTime;

end

We caculate the action value by the following function.

function [schedule_act, ppath_act] = get_act(action)

schedule_act = action{1,1};

ppath_act = action{1,2};

end

When I run the validateEnvironment function, the error is like the following.

I want to know how to fix them.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Emmanouil Tzorakoleftherakis 2023-3-7

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1924530-how-to-input-action-in-reinforcement-learning-template-environment#answer_1187560

Easiest thing you can do is add a break point and display what "action" variable is. It's obviously not a cell array so you cannot access is with braces {} in the "get_act" function. That's why you are getting the error

8 个评论
显示 6更早的评论隐藏 6更早的评论

Yang Chen 2023-3-9

It is about the size of my discrete action space. For example, my action space is like {[1, 2, 3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]}, which follows all random order of 1-3. When we increase the amount of number to 20, the amount of data size is over the system limitation.

Emmanouil Tzorakoleftherakis 2023-3-9

Thanks for clarifying. This is the curse of dimensionality, not much you can do about that other than using a continuous action space unfortunately.

请先登录，再进行评论。

How to input action in reinforcement learning template environment?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

8 个评论
显示 6更早的评论隐藏 6更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

How to input action in reinforcement learning template environment?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

8 个评论 显示 6更早的评论隐藏 6更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

8 个评论
显示 6更早的评论隐藏 6更早的评论