How to pretrain a stochastic actor network for PPO training?
5 次查看(过去 30 天)
显示 更早的评论
I want to create a stochastic actor network that outputs an action array of 10 values between 0 and 1 given an observation array of 28 normalized values. I specified upper and lower limits as follows to ensure the actor's output to be between 0 and 1:
ActionInfo = rlNumericSpec([numActions 1],'LowerLimit',[0;0;0;0;0;0;0;0;0;0],'UpperLimit',[1;1;1;1;1;1;1;1;1;1]);
My stochastic network looks as follows:
I have created a normalized training data set (input dimension 28, target dimension 10). How do I use this data set to pretrain above network?
Clarification: I want to train the network before starting the PPO agent training.
0 个评论
采纳的回答
Anh Tran
2021-5-13
Hi Jan,
You can pretrain a stochastic actor with Deep Learning Toolbox's trainNetwork with some additional work. Emmanouil gave some good pointers initially but I want to add those steps:
You need a custom loss layer since the stochastic actor network outputs mean and standard deviations, while your target is action. You can try maximum log likelihood loss. You can follow the instruction here to create a custom loss layer (you don't have to implement backward pass as autodifferentiation will take care of it)
% We want to maximize objective of log f(x) where f(x) is the probability density function follows Normal(mean, sigma)
% Loss = -Objective = - log(f(x)) = 1/2*log(2*pi) + log(sigma) + 1/2*((x-mu)/sigma)^2;
Keep in mind that you must protect against log(0), adding eps is sufficient. x is your action target.
4 个评论
Anh Tran
2021-5-17
As mentioned from the error message, value to differentiate must be a scalar. Thus, you need to compute mean of the loss over each batch. Also, I am not sure why you need a for-loop to compute loss. We can vectorize the computation as followed (since sigma, T, mu have same size)
% vectorize loss computation
loss = 0.5*log(2*pi) + log(sigma + eps) + 0.5*((T-mu)./(sigma+eps)).^2;
% mean of the loss over each batch
loss = sum(loss,'all');
loss = loss/batchSize;
更多回答(1 个)
Emmanouil Tzorakoleftherakis
2021-5-13
Hello,
Since you already have a dataset, you will have to use Deep Learning Toolbox to get your initial policy. Take a look at the examples below to get an idea:
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Training and Simulation 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!