![photo](/responsive_image/150/150/0/0/0/cache/matlabcentral/profiles/16022361_1566736232825_DEF.jpg)
James Sorokhaibam
Followers: 0 Following: 0
Feeds
提问
High fluctuation in Q0 value for TD3 agent while training.
I am training a TD3 RL agent for pick and place robot. The reward function is, reward = exp(-E/d) where E is the total energy co...
9 months 前 | 1 个回答 | 0