Advantage normalization for PPO Agent
显示 更早的评论
When dealing with PPO Agents, it is possibile to set a "NormalizedAdvantageMethod" to normalize the advantage function values for each mini-batch of experiences. The default value is "none".
While I can intuitively grasp that such a normalization operation may be beneficial in terms of reducing variance, I could not find any reference online which describes with sufficient details when and why this procedure should be useful. My questions are:
1) Under which circumstances does the normalization of advantage function values turn out to be practically beneficial?
2) If I decide to normalize the advantage function values, are there situations where the "moving" option (which uses a restrict number of samples) can be more beneficial then the "current" option (which uses all of the current samples available)? Intuitively I would say that the "current" option should always perform better
采纳的回答
更多回答(0 个)
类别
在 帮助中心 和 File Exchange 中查找有关 Deep Learning Toolbox 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!