Parallel reinforcement learning in separate runs leads to strange learning curve

1 次查看(过去 30 天)
I'm running training of a DDPG reinforcement learning agent using a HPC cluster node and parallel computing toolbox for only 400 episodes due to some errors I experienced before when running it for much more episodes. Then I save the agent including the experience buffer and repeat the training in a loop. I start the training with
agent.AgentOptions.ResetExperienceBufferBeforeTraining = false;
agent.AgentOptions.SaveExperienceBufferWithAgent=true;
trainingStats = train(agent,env,trainOpts);
and save the agent with
agent.AgentOptions.SaveExperienceBufferWithAgent=true
save(filename, 'agent', '-v7.3');
I can see the experience buffer growing since
agent.ExperienceBuffer.Length
becomes larger. I use
load(PRE_TRAINED_MODEL_FILE,'agent');
agent.AgentOptions.NoiseOptions.Variance = [1200;400;2;1000].*exp(pastepisodes*log(1-agentOpts.NoiseOptions.VarianceDecayRate));
to get the noise variance decay I would expect when using only one training run. The learning rate for the critic is 5e-03 and for the actor 1e-03.
The result is a learning curve I wouldn't expect. I think the curve looks like either the noise variance is reset on each run, or the ExperienceBuffer from the last runs is not being used. The reward should reach approx. 1500.
Does anybody has an idea why the curve looks like this? Do you have an advice on how to adjust the hyperparameters?

回答(0 个)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by