Reinforcement Learning Toolbox - When does algorithm train?

Question

0 个投票

I am currently using the RL-Toolbox with a DQN-Agent built into a long-running process-simulation.

The maximum stepcount is currently 8000 steps per episode.

Unfortunately the documentation seems a little ambiguous to me, so here my question:

Doese the train-function of the RL-Toolbox train the agent at the end of an episode or during the episode when the step count exeeds the minibatch-size (like in the baseline algorithms)?

Thank you in advance.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Emmanouil Tzorakoleftherakis 2019-9-25

0 个投票

The implementation is based on the algorithm listed here.

Weights are being updated at each time step.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Hans-Joachim Steinort 2019-9-26

"For each training time step" - that was the line I was looking for (yet looking into the source code lead me to the same conclusion).

After double-checking the baseline-algorithms I found that they do it the same way.

Thank you for your time!

请先登录，再进行评论。

Reinforcement Learning Toolbox - When does algorithm train?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

类别

产品

版本

标签

Community Treasure Hunt

Reinforcement Learning Toolbox - When does algorithm train?

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

更多回答（0 个）

类别

产品

版本

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论