Is there a way for making one agent wait for other agents to reach their episode termination criteria in multi agent reinforcement learning?

Question

MAZBAHUR KHAN 2023-8-27

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2013717-is-there-a-way-for-making-one-agent-wait-for-other-agents-to-reach-their-episode-termination-criteri

评论： MAZBAHUR KHAN 2023-9-11

Hello. I am doing a project on decentralised multi agent training for multiple mobile robot path planning. I have set the episode termination criteria ( in simulink :isdone) for each robot individually such that when the robot clashes with an obstacle or reaches the goal position, the episode is terminated. But I noticed that when one robot has reached its episode termination criteria but the other robots haven’t, the episode terminates and new episode begins. And in the next episode all the robots are assigned into new initial positions. Hence inefficiency is induced in training of each robot since their reward collection is getting interrupted for the other robots episode termination. I was wondering, is there a way to make a robot wait for other robots to reach their termination criteria after it reaches its own episode termination criteria before a new episode starts for all robots? If so, how? A little help would be highly appreciated. Thanks in advance

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Yatharth 2023-9-6

Hi Mazbahur,

I understand that you want a single episode to run till each robot either collides with an object or reach the destination(terminating conditions). However currently your episode is being terminated for every robot whenever one of your robot meets the terminating condition.

In a decentralized multi-agent training scenario, where each robot has its own episode termination criteria, it is indeed important to synchronize the termination of episodes to avoid the inefficiency you described. One way to address this issue is by introducing a synchronization mechanism among the robots. Here's a possible approach:

1. Maintain a shared variable or flag that keeps track of whether any robot has reached its episode termination criteria. This flag can be initially set to `False`.

2. Whenever a robot reaches its termination criteria, it checks the shared flag. If it is still `False`, the robot sets it to `True` and waits for a signal from other robots.

3. Meanwhile, other robots continue their execution until they reach their own termination criteria. Once a robot reaches its termination criteria, it checks the shared flag. If it is `True`, the robot proceeds to the next step. Otherwise, it waits until the shared flag becomes `True`.

4. Once all robots have reached their termination criteria and are waiting, a synchronization signal is sent to all robots to indicate that they can start a new episode together. This signal can be broadcasted to all robots simultaneously.

5. Upon receiving the synchronization signal, each robot resets its environment and starts a new episode from the assigned initial positions.

By implementing this synchronization mechanism, you can ensure that all robots wait for each other before starting a new episode. This way, the training of each robot will not be interrupted by the termination of other robots' episodes.

You can create custom Flags using "MATLAB Function Block" in Simulink. Here is a basic example how to use the "Function Block" Implement MATLAB Functions in Simulink with MATLAB Function Blocks - MATLAB & Simulink - MathWorks India.

I hope this helps.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

MAZBAHUR KHAN 2023-9-11

@Yatharth thank you very much. This is really helpful.

Just a little confusion. I have built the model in simulink (mobile robotics simulation toolbox) based on constant values for robots linear and angular velocities. And I have given a negative reward for the robots when they collide with an obstacle. The collision has been defined as when a LIDAR range reading is equal to the robot radius. So every time the LIDAR senses a reading less than robot radius it recives a negative reward and ends the episode.

So, after implementing the shared flag technique, when a robot collides with an obstacle and checks the flag the velocity values would still be the same. So wont it be moving forward through the obstacles? And even if its stops there, won't it keep collecting negative rewards for next time steps (until all the robots check the flag and a new episode starts) since it would be at the state of collision with an obstacle.

请先登录，再进行评论。

Is there a way for making one agent wait for other agents to reach their episode termination criteria in multi agent reinforcement learning?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Is there a way for making one agent wait for other agents to reach their episode termination criteria in multi agent reinforcement learning?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论