Reinforcement Learning on Simscape
显示 更早的评论
I am having an issue with RL in simscape. I added a unit delay in order to break an algrbraic loop but what the unit delay initial condition does it that it sets the value I want to change to a constant equal to the initial condition of the unit delay bloc. Do you by any chance know what might be the problem causing this ?
I will add a screenshot of the training.
采纳的回答
One option is to look at introducing the delay on the observation, not the action. Please take a look at this page for more details
14 个评论
@Emmanouil TzorakoleftherakisI have another question that I am not able to answer, my actions are constant at each episode.
Hi, In general, if you have a question unrelated to the original one, it's a good idea to start a separate thread for visibility. Not sure which agent you are using, but make sure your exploration options make sense. Also, let the agent run for a few episodes first as sometimes the behavior you are describing is common in the initial episodes.
@Emmanouil Tzorakoleftherakis My apologies. I am using a PPO agent with a 0.001 Learning rate for both the actor and the critic. I did a trainning over 50 episodes but the action is still constant (it's changing from an episode to another tho). I am very new to RL and I am trying mostly with trial and error. Thank you for your previous responses.
Np. Which release are you using? How long are your episodes? What is the agent sample size? What is your reward? Also, I would let training continue for a few hundred episodes and check again if the issue persists.
@Emmanouil Tzorakoleftherakis Thank you for your response.
I am using a PPO agent with the hyperparameters in the screenshot attached to the message. I am using 2023a. The reward is also attached to this message. Thank you in advance hope this clears the situation. I am actually doing an RL control on a model I did myself for the control of district heating networks.
PS: I will later on create another question and link it to this one that way the question would be much more visible.
Thanks. It seems your agent sample time is the same as the episode duration. Is that expected? How often do you expect your agent to take actions? Regardless, that explains what you are seeing. The agent will basically only take one action per episode,so 50 actions in total for 50 episodes. This is really not sufficient training time.
@Emmanouil Tzorakoleftherakis Yes I did not see that I had the same sample time and experience horizon. Thank you very much.
I was actually referring to the isdone signal in the reward function. It is set to true at t=86400 which is the same as the agent sample time.
@Emmanouil Tzorakoleftherakis Oh sorry I see ! And what option would be a good one in that case ?
Depends on your problem and how frequently your agent needs to take actions (that's determined by the agent sample time)
@Emmanouil Tzorakoleftherakis So for example if I want my agent to change the mass flow every hour for one day I should put the sample time at 3600 (the number of seconds in an hour) with the isdone condition at 86400 (the number of seconds in a day)
Correct. Alternatively, you could use the maxstepsperepisode training option and leave the isdone flag to be false all the time. The IsDone flag can be used for cases where you want to terminate an episode early (e.g. some constraint is being violated, etc.)
@Emmanouil Tzorakoleftherakis Perfect. Thank you very much sir !
更多回答(0 个)
类别
在 帮助中心 和 File Exchange 中查找有关 Training and Simulation 的更多信息
另请参阅
2024-6-28
2024-7-2
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!选择网站
选择网站以获取翻译的可用内容,以及查看当地活动和优惠。根据您的位置,我们建议您选择:。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
