Hi Bryan,
It is very hard to pinpoint the exact reason for the sudden drops in episodic reward (sum of rewards at each step) without knowing anything about the environment or reward function. RL training is stochastic, so it is likely that the agent may be entering states in certain episodes that cause early termination or large penalties. This can obviously have a large impact on the cumulative reward. A suggestion would be to run a short training, save the agent information, and investigate whether the reward function is being evaluated correctly by the agents.
Fluctuations can be a sign that the agent is still exploring the environment. It's essential to balance exploration with exploitation. You might need to adjust parameters related to exploration.
Refer to the following documentation which provides further details about the training algorithm. https://www.mathworks.com/help/reinforcement-learning/ug/ddpg-agents.html