- For non-recurrent neural networks, these samples are selected randomly because the network treats each input independently.
- LSTMs require a sequence of experiences to effectively learn temporal features.
- When using PPO with LSTM, the agent emphasizes on managing sequences of experiences to leverage the LSTM's ability to learn from temporally dependent data.
- A trajectory is a sequence of states, actions, and rewards that an agent experiences in the environment from the start of an episode until a terminal state.
- The agent learns from trajectories of experiences.
- The “MiniBatchSize” value specifies the length of these trajectories.
- If “MiniBatchSize” is set to 50, the LSTM network will be trained on trajectories of experiences where each trajectory is 50 steps long.