solve critic overestimate and how to explore specific action range
2 次查看(过去 30 天)
显示 更早的评论
hello
im using a ddpg agent to tune a robot controller.all of my rewards are negetive and my critic learning rate is 0.01 and my actor learning rate is 0.0001 with adan optimizer and my gradient tresholds are 1. i have tow questions :
1- when my action ange is between [0.00001 0.2] my q0 predict a negetive value too(although with a large bias over actual value) but when my action range is between[0.00001 0.5] my critic have large overstimating around big positive values. why this happen with using bigger action range?
2- i define my action range between [0.00001 0.5] but i know my best action sit somewhere about [0.1 0.2] most of the time. how should i define my actor to explore this range more? is this related to noise option? how should i define ornstein-ohlenbeck noise option to explore this area?
0 个评论
回答(0 个)
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!