TD3 agent fails to explore again after hitting the max action and gets stuck at the max action value. Additionally, the Q0 value exploded to large value.

12 次查看（过去 30 天）

Bay Jay 2024-6-17

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2129201-td3-agent-fails-to-explore-again-after-hitting-the-max-action-and-gets-stuck-at-the-max-action-value

评论： Bay Jay 2024-6-19

The range of the a single action = 0.01 to 5. During learning using TD3, the learning is consist. However, if the agent applies the maximum values, it get stuck fails to explores lower values and suddenly does not improve or deteriorate further. I am not sure what could be the reason. The Q0 value explodes at this point.

at this point.

2 个评论
显示无隐藏无

surya venu 2024-6-17

Hi,

The situation you're describing with your TD3 agent is indicative of a few potential issues in continuous action spaces.

Enhance Exploration:

Adjust the exploration noise scale to ensure the agent explores actions across the entire range, not just at the max value.

Refine Reward Function:

Ensure the reward function doesn't bias the agent towards always picking the maximum action value by providing incentives for exploring different actions.
TD3 uses reward clipping to prevent the agent from learning from unrealistically large rewards. If the reward clipping is set too high, it could prevent the agent from learning from the negative consequences of taking the maximum action. This could lead to the agent getting stuck at the maximum action value.pen_spark

Address Q-Value Explosion:

Implement gradient clipping to prevent large updates that can lead to value explosion.
Ensure the target network update rate is set to maintain training stability.

Regularization and Normalization:

Consider using batch normalization and weight regularization to stabilize the learning process.

Hope it helps.

Bay Jay 2024-6-19

@surya venu I have increased the noise standard deviation for early learning/exploration of the max action. Monitoring for the performance. Thanks for the suggestions. Appreciated.

请先登录，再进行评论。

请先登录，再回答此问题。

回答（0 个）

请先登录，再回答此问题。

类别

AI and Statistics Deep Learning Toolbox Applications Autonomous and Control Systems Reinforcement Learning

在 Help Center 和 File Exchange 中查找有关 Reinforcement Learning 的更多信息

产品

版本

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

TD3 agent fails to explore again after hitting the max action and gets stuck at the max action value. Additionally, the Q0 value exploded to large value.

2 个评论
显示无隐藏无

回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

TD3 agent fails to explore again after hitting the max action and gets stuck at the max action value. Additionally, the Q0 value exploded to large value.

2 个评论 显示 无隐藏 无

回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

2 个评论
显示无隐藏无