Enforce action space constraints within the environment

26 次查看(过去 30 天)
Hi,
My agent is training!!! But it's pretty much 0 reward every episode right now. I think it might be due to this:
contActor does not enforce constraints set by the action specification, therefore, when using this actor, you must enforce action space constraints within the environment.
How can I do this?
Also, is there a way to view the logged signals as the agent is training?
Thanks!
  1 个评论
John Doe
John Doe 2021-2-24
There's something odd going on. It's not 0 reward, but it's not growing. I do have that first action method i said implemented in the other question (so for 4 of the continuous actions, it only chooses the first action) and for 1 action it's used every time step. I guess i need to check the logged signals to really determine what's going on. I'm too excited to make it work on the first or second try lol

请先登录,再进行评论。

采纳的回答

Emmanouil Tzorakoleftherakis
If the environment is in Simulink, you can setup scopes and observe what's happening during training. If the environment is in MATLAB, you need to do some extra work and plot things yourself.
For your contraints question, which agent are you using? Some agents are stochastic and some like DDPG add noise for exploration on top of the action output. To be certain, you can use a saturation block in Simulink or an if statement to clip the action as needed in MATLAB.
  28 个评论
John Doe
John Doe 2021-3-2
编辑:John Doe 2021-3-2
How can I do the scaling of the inputs to the network? That seems like the best way forward.
The environment is already constraining the actions, but the training is extremely sample inefficient and basically bouncing across the upper and lower limits of the actions for hundreds of episodes.
Emmanouil Tzorakoleftherakis
multiply the observations inside the 'step' function with a number that makes sense

请先登录,再进行评论。

更多回答(1 个)

John Doe
John Doe 2021-3-17
编辑:John Doe 2021-3-17
Hi,
I feel like i'm really close to getting this. I haven't gotten a successful run yet. For thousands of episodes, the agent continues to use actions way out of the limits. I've tried adding the min/max thing for forcing them in the environment. Do you have any tips on how I can make it converge to stay within the limits? I even tried changing the rewards to be equivalent to be close to the limits.
I'm wondering whether this is perhaps a known issue that is on the roadmap to make the agent pick actions within spec limits for the continuous agent?
  5 个评论
John Doe
John Doe 2021-3-18
Here's an example training. I gave it a negative reward for going outside the bounds of the action. This demonstrates how far outside the range the actor is picking. This same thing occurs for more episodes (5000) , although I don't have a screenshot for that. Surely there must be something I"m doing wrong? How can I make this converge?
John Doe
John Doe 2021-3-25
I had a bug where I was using normalized values instead of the real values! I was able to solve the environment after that after changing the action to discrete! THanks for all your help and this wonderful toolbox!

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Applications 的更多信息

产品


版本

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by