Reinforcement Learning Zero Reward

Question

0 个投票

I'm Training multiple reinforcement learning agents using a Simulink model with a custom function (to simulate a card game).

I can compile and run the model in Simulink with no problems, and attatching a scope to the reward and isdone signals show that they are set correctly (The reward is non-zero, and the isdone signal terminates the simulation at the correct time).

However, when I try to train the model, each agent shows zero reward.

Similar problems suggest that the problem may be with the isdone flag being set incorrectly, however I am confident that this is not the case, as each step outputs text into the command window (as desired), and so suggests that the model is simulating correctly during training.

To receate, run the 'CreateAgents' and 'CreateEnvironment' programs, open the 'WhistLearningVaribles.mat' file (containing necessary variables for the simulation and the training options), run 'myResetFunction', and train using the command: (Other functions must be present - the model references them during simulation)

stats = train([Player1, Player2, Player3, Player4],env,trainOpts);

Other functions must be present in the files structure - the model references them during simulation

Any advice would be much appreciated. Thanks!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Ari Biswas 2021-5-13

0 个投票

In your Simulink model workspace you have several agent objects saved with the same variable names as referenced in the RL Agent blocks. This is causing a conflict when resolving the agents during training. Remove these agent objects from the model workspace if you dont intend to use them. You can access the model workspace by pressing Ctrl+H from your Simulink model. Once you remove them you will be able to train correctly.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Reinforcement Learning Zero Reward

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（0 个）

类别

产品

版本

标签

Community Treasure Hunt

Reinforcement Learning Zero Reward

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

更多回答（0 个）

类别

产品

版本

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论