Parallel reinforcement learning on HPC with warning "Received duplicate id = x from worker"
3 次查看(过去 30 天)
显示 更早的评论
When I'm running training of a reinforcement learning agent using a HPC cluster and parallel computing toolbox I get the warning "Received duplicate id = 22 from worker" (or other id) after e.g. 180 training episodes. Then the training seems to be stopped and there is no further error or warning. I am using this command to start the .m-script:
module load matlab/R2021a
matlab -nodisplay < rl_training.m
When I set
trainOpts.UseParallel = false;
often I get the warning "Error reading character from command line". Does anyone know why these messages are occurring and is there perhaps a way to continue the training?
5 个评论
Image Analyst
2021-12-2
If you have a maintenance contract in place, I'd call them on the phone. Of course you can use email like @Raymond Norris said. I never use email or a support page since when I encounter a problem I need an immediate solution so I call them.
Walter Roberson
2021-12-5
I never call them, myself -- I open support cases, where I can describe the problem and include code and results to show clearly what is expected and what is received instead. 85% of the time the response is going to be "You are right, that's not good, the developers have been notified and it might get fixed some day".
回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Startup and Shutdown 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!