RL Agent does not learn
显示 更早的评论
Hello,
I'm up to Reinforcement Learning with the RL Toolbox. After I built a custom environment in Simulink I have problems training the PG Agent. The RL problem is to control a system with diffuse irradiance, direct irradiance, and temperature as states and a mass flow rate as actions which can be 0 or 30. While Simulink generates a cost function it is also the reward signal and the objective is to reduce the cost over one episode of 24 time steps.
The code is the following:
obsInfo = rlNumericSpec([3 1]);
obsInfo.Name = 'Observation';
actInfo = rlFiniteSetSpec([0 30]);
actInfo.Elements = [0 30];
actInfo.Name = 'Action';
env = rlSimulinkEnv(mdl,[mdl '/RL Agent'],obsInfo,actInfo);
% Create deep neural network approximator for the actor
net = [ imageInputLayer([3 1 1], 'Normalization', 'none', 'Name', 'state')
fullyConnectedLayer(32, 'Name', 'fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(32,'Name','fc2')
reluLayer('Name','relu2')
fullyConnectedLayer(32,'Name','fc3')
reluLayer('Name','relu3')
fullyConnectedLayer(2, 'Name', 'fc4')
softmaxLayer('Name', 'actionProb') ];
% Create actor
actorOpts = rlRepresentationOptions('LearnRate',0.01,'GradientThreshold',1);
actor = rlStochasticActorRepresentation(net, obsInfo, actInfo, 'Observation', 'state',actorOpts);
% Create Agent
opt = rlPGAgentOptions('DiscountFactor',0.0001);
agent = rlPGAgent(actor);
When I train the agent for 2000 Episodes with different configurations of the neural net I have the problem that it does not converge to a path at all. At some point the policy finds configurations which result in better reward but afterwards the agent does not converge further and does not follow the improved policy.
It would be great if you could help me solve the problem. Do you think this happens due to a insufficient reward signal or does the structure of my neural net not fit my observation and action signals? I also tried using tanhLayer or different amounts of nodes without any success.
Thank you very much for your help!
Best regards
Janika

1 个评论
shadi abpeikar
2021-2-16
Hi Janika,
Im just wondering if you find the solution? I have the same problem, and I hope you would give me some hints on how you solved this?
Thanks.
采纳的回答
更多回答(1 个)
rbih rbih
2021-12-16
0 个投票
Hello Janika,
I'm also new on matlab RL thing. though may be the activation plays an important role in the agent learning. have you tried to use (tanh) activation instead of (relu) activation.
let me know.
类别
在 帮助中心 和 File Exchange 中查找有关 Training and Simulation 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!