geting negative reward in two agent while other 2 are geting trained
2 次查看(过去 30 天)
显示 更早的评论
I am able to train agent A5 and agent 8, but getting constant negative reward in agent 6 and 7. I am tr
ying to control the quadrotor

0 个评论
回答(1 个)
TARUN
2025-6-11
Based on the training plot and code you shared, Agent6 and Agent7 are consistently receiving negative rewards, while Agent5 and Agent8 are learning effectively.
This typically points to either an environment setup issue or improper agent configuration.
You can follow the below steps to fix this issue:
1. Incorrect Block Path for Agent7: There seems to be an extra space in the block path:
'rl_backstep_Multi/ Agent7'
It should be:
'rl_backstep_Multi/Agent7'
A malformed block path will prevent proper environment-agent linking.
2. Shared Observation/Action Specifications: All agents are using the same ainfo and oinfo objects. This results in shared handles, which may cause unexpected behavior. Instead, define separate specs for each agent like:
oinfo1 = rlNumericSpec([2 1]); oinfo1.Name = 'obs1';
ainfo1 = rlNumericSpec([1 1]); ainfo1.Name = 'act1';
% Repeat it for each agent
3. Reward Signal Issue Agent6 and Agent7 might be receiving either invalid or overly penalizing rewards. Please check your reward block logic inside Simulink — make sure it’s producing finite and meaningful values throughout training. You can log or scope the reward signals to debug this.
4. Hyperparameter Tuning Sometimes, PPO agents can diverge depending on the learning rate or entropy weight. You could reduce the ActorOptimizerOptions.LearnRate and EntropyLossWeight slightly for Agent6 and 7 to stabilize their learning.
Feel free to go through the following documentation to understand agent-environment integration and reward design:
0 个评论
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!