Feeds
提问
Why RL agent performs same actions repeatedly still it does not constitute optimal policy or better episode Q0.Can anyone explain?
4 years 前 | 0 个回答 | 0
0
个回答提问
Episode Q0 increases exponentially
Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?
4 years 前 | 1 个回答 | 0