geting negative reward in two agent while other 2 are geting trained

Question

Kartikeya 2023-6-3

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1977634-geting-negative-reward-in-two-agent-while-other-2-are-geting-trained

回答： TARUN 2025-6-11

I am able to train agent A5 and agent 8, but getting constant negative reward in agent 6 and 7. I am tr

ying to control the quadrotor

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

TARUN 2025-6-11

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1977634-geting-negative-reward-in-two-agent-while-other-2-are-geting-trained#answer_1566346

在 MATLAB Online 中打开

Hi @Kartikeya,

Based on the training plot and code you shared, Agent6 and Agent7 are consistently receiving negative rewards, while Agent5 and Agent8 are learning effectively.

This typically points to either an environment setup issue or improper agent configuration.

You can follow the below steps to fix this issue:

1. Incorrect Block Path for Agent7: There seems to be an extra space in the block path:

'rl_backstep_Multi/ Agent7'

It should be:

'rl_backstep_Multi/Agent7'

A malformed block path will prevent proper environment-agent linking.

2. Shared Observation/Action Specifications: All agents are using the same ainfo and oinfo objects. This results in shared handles, which may cause unexpected behavior. Instead, define separate specs for each agent like:

oinfo1 = rlNumericSpec([2 1]); oinfo1.Name = 'obs1';  
ainfo1 = rlNumericSpec([1 1]); ainfo1.Name = 'act1'; 

% Repeat it for each agent

3. Reward Signal Issue Agent6 and Agent7 might be receiving either invalid or overly penalizing rewards. Please check your reward block logic inside Simulink — make sure it’s producing finite and meaningful values throughout training. You can log or scope the reward signals to debug this.

4. Hyperparameter Tuning Sometimes, PPO agents can diverge depending on the learning rate or entropy weight. You could reduce the ActorOptimizerOptions.LearnRate and EntropyLossWeight slightly for Agent6 and 7 to stabilize their learning.

Feel free to go through the following documentation to understand agent-environment integration and reward design:

rlSimulinkEnv: https://www.mathworks.com/help/releases/R2023a/reinforcement-learning/ref/rlsimulinkenv.html?searchHighlight=rlsimulinkenv&s_tid=doc_srchtitle

RL toolbox: https://www.mathworks.com/help/releases/R2023a/reinforcement-learning/index.html?searchHighlight=reinforcement+learning+toolbox&s_tid=doc_srchtitle

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

geting negative reward in two agent while other 2 are geting trained

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

geting negative reward in two agent while other 2 are geting trained

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论