TD3 agent fails to explore again after hitting the max action and gets stuck at the max action value. Additionally, the Q0 value exploded to large value.

3 次查看(过去 30 天)
The range of the a single action = 0.01 to 5. During learning using TD3, the learning is consist. However, if the agent applies the maximum values, it get stuck fails to explores lower values and suddenly does not improve or deteriorate further. I am not sure what could be the reason. The Q0 value explodes at this point. at this point.
  2 个评论
surya venu
surya venu 2024-6-17
Hi,
The situation you're describing with your TD3 agent is indicative of a few potential issues in continuous action spaces.
Enhance Exploration:
  • Adjust the exploration noise scale to ensure the agent explores actions across the entire range, not just at the max value.
Refine Reward Function:
  • Ensure the reward function doesn't bias the agent towards always picking the maximum action value by providing incentives for exploring different actions.
  • TD3 uses reward clipping to prevent the agent from learning from unrealistically large rewards. If the reward clipping is set too high, it could prevent the agent from learning from the negative consequences of taking the maximum action. This could lead to the agent getting stuck at the maximum action value.pen_spark
Address Q-Value Explosion:
  • Implement gradient clipping to prevent large updates that can lead to value explosion.
  • Ensure the target network update rate is set to maintain training stability.
Regularization and Normalization:
  • Consider using batch normalization and weight regularization to stabilize the learning process.
Hope it helps.
Bay Jay
Bay Jay 2024-6-19
@surya venu I have increased the noise standard deviation for early learning/exploration of the max action. Monitoring for the performance. Thanks for the suggestions. Appreciated.

请先登录,再进行评论。

回答(0 个)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by