- Reward Function: It must reflect the desired cooperative behaviour. You can try incorporating shared rewards or team-based objectives to align individual agent goals with the overall system performance.
- Training Stability: A stable ‘Average Reward Curve’ indicates consistent learning, while high variance might indicate instability or conflicting actions among agents. Ideally, the variance reduces as training progresses.
- Hyperparameter Tuning: Often, tuning the hyperparameters like the Learning Rate (LR), Discount Factors (DF), etc. can significantly improve the performance of the RL Agents.
- Train RL Agents: https://www.mathworks.com/help/reinforcement-learning/ug/train-reinforcement-learning-agents.html
- Agent Options: https://www.mathworks.com/help/reinforcement-learning/ref/rl.option.rltd3agentoptions.html
- Training and Simulation: https://www.mathworks.com/help/reinforcement-learning/training-and-simulation.html