Episode Q0 increases exponentially
18 次查看(过去 30 天)
显示 更早的评论
Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?
0 个评论
回答(1 个)
Emmanouil Tzorakoleftherakis
2021-2-16
Hello,
Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.
Hope this helps
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Training and Simulation 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!