I am using reinforcement learning for voltage control when renewable varies. However, after training the agent keeps giving me constant control action regardless what the input is. Where can be the problem? it should be something wrong with the Env at the reward function or reset function?
My understanding is, the training process achieves an optimal action that works for all the observations with highest reward in average. So the trained agent gives a constant ouptuts Whereas my objective is to let the trained agent provide the optimial action for each observation. How should i fixed it?
My code is very simple with the following points:
Observation = [Power Injections; uncertainties, voltages]; Action = [Control Injection] on selected buses.
reset function: add a random uncertainties range on power injections, then run power flow to get the voltage.
step function: take this.state and Action. apply Action to change power injections, then run power flow to get the voltage. Update system states.
reward function: high voltage improvement - control efforts needed. It can be simplified as follows:
P_inj = this.State(:,1);
Ctr = sum(P_inj(this.Ctr_bus)-this.PowerInjections(this.Ctr_bus));
Vol = this.State(:,3)-this.GoalVoltage;
Reward = sum(Vol)-Ctr;
Thank you for your time and help. I can provide more details if needed.