number of look ahead steps in DDPG Agent Options
4 次查看(过去 30 天)
显示 更早的评论
I want to know how does the parameter "NumStepsToLookAhead" in rlDDPGAgentOptions from reinforcement learning toolboxof matlab 2019b works?
- Whether the look ahead is done on target networks? (like modification in critic objective, from {r+gamma*Qt - Q} to {r+ sum(gamma**i*Qt) -Q}
- Or the look ahead is done on reward sampling itself? ( like changing reward "r" from each sample to "r+gamma*r_t+gamma**2*r_t+1+...
Any help is highly appreciated.
0 个评论
回答(1 个)
Anh Tran
2020-3-1
I am not sure what does reward sampling mean. "NumStepsToLookAhead" in rlDDPGAgentOptions changes the critic's target values in step 5 of DDPG training algorithm.
Assume g is the discount factor, the critic target will be as followed
4 个评论
Dingshan Sun
2022-9-1
Could you give a hint how R_t,R_t_1,,R_t+2,...,R_t+n-1 can be obtained in an online off-policy algorithm? Especially for DRL methods that use an experience replay?
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Environments 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!