Reinforcement learning actor is empty

1 次查看(过去 30 天)
Hello everyone,
I am using the Reinforecement learning toolbox. I created my environment and used rlDQNAgent. But when I want to use the actor, the command getActor(Agent) gives actor =[]
I don't know what is the problem. Also after training I wanted to evaluate the agent by using the command getAction for different observations, but it always gives back the same action for all observations, this was not the case when I used the same command with the same observations before the training.
Any suggestions
Here is my code
ActionInfo = getActionInfo(env);
ObservationInfo = getObservationInfo(env);
dnn = [
featureInputLayer(ObservationInfo.Dimension(2),'Normalization','none','Name','state')
fullyConnectedLayer(24,'Name','CriticStateFC1')
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(24, 'Name','CriticStateFC2')
reluLayer('Name','CriticCommonRelu')
fullyConnectedLayer(length(ActionInfo.Elements),'Name','output')];
criticOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1,'L2RegularizationFactor',1e-4);
critic = rlQValueRepresentation(dnn,ObservationInfo,ActionInfo,...
'Observation',{'state'},criticOptions);
agentOpts = rlDQNAgentOptions(...
'UseDoubleDQN',false, ...
'TargetSmoothFactor',1, ...
'TargetUpdateFrequency',4, ...
'ExperienceBufferLength',100000, ...
'DiscountFactor',0.99, ...
'MiniBatchSize',256);
% agentOptions = rlDDPGAgentOptions;
agent = rlDQNAgent(critic,agentOpts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',500, ...
'MaxStepsPerEpisode',30, ...
'Verbose',false, ...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',30);
trainingStats = train(agent,env,trainOpts);

回答(1 个)

Zuber Khan
Zuber Khan 2024-5-8
Hi,
Based on my understanding, you are getting an empty result while using "getActor(Agent)" command due to the usage of rlDQN agent. This is because DQN agent is a value-based reinforcement learning agent that trains a critic to estimate the expected discounted cumulative long-term reward when following the optimal policy.
Kindly note that value-based agents are agents that use only critics to select their actions and rely on an indirect policy representation. They use an approximator to represent a value function (value as a function of the observation) or Q-value function (value as a function of observation and action).
You can refer to the following documentation to understand more about DQN agents:
In order to know more about creating policies and value functions, kindly refer to the following documentation:
As far as the second issue is concerned, since you are getting the same action for all observations, it means that the agent has not trained properly in the given environment. It is possible that the agent may not have learned a sufficiently diverse policy. This could be due to not training long enough, the complexity of the environment, or the chosen architecture and hyperparameters not being optimal.
I would suggest you to look closely at the training progress plots and metrics to ensure that the agent is learning effectively over time. If the performance plateaus early or doesn't improve, consider adjusting the network architecture, agent options, training options or other involved hyperparameters. Also, ensure that when you are evaluating the agent with "getAction", the observations you provide are significantly different and cover the state space well. Sometimes, subtle differences in observations might not lead to different actions, especially if the Q-values are close.
Further, you can try even a different type of RL agent that might be better suited to your environment.
Since you have not explictly provided the environment, it is not possible to debug the given code. Therefore, I have given a generic response.
I hope this will help you in resolving your issue.
Regards,
Zuber

类别

Help CenterFile Exchange 中查找有关 Agents 的更多信息

产品


版本

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by