- Use small EntropyWeightOptions.EntropyWeight in rlSACAgentOptions like 0.01. This weight is automatically learned internally, but it would take some time if the initial EntropyWeight is too big.
- You can add a tanh layer and a scaling layer to the standard deviation path to directly minimize the action's uncertainty.
- If none of the above doesn't work, it would be better to normalize actions in the environment. You can set the same range for all actions in the action spec and scale them correctly inside the environment. Because SAC relies on entropy for the exploration, having similar action ranges will be better.
SAC agent actor network setup and action generation
5 次查看(过去 30 天)
显示 更早的评论
Hi, I'm trying to develop a SAC agent for continuous control task with 2 actions. The agent's explored actions looks like this:
The first action is fluctuating between the maximum and minimum. The second action seems to be exploring the action space well. It is good to note that the range and magnitude of the actions differ signifcantly but I have normalized it at the critic input.
When I attempted these using deterministic agents, I used a tanhLayer and scalingLayer to normalize the action and scale it. The SAC documentation here suggests that the output tanhLayer and ScalingLayers are added automatically, eventhough it does not show up in the actor network structure.
The documentation also quotes: 'Do not add a tanhLayer or scalingLayer in the mean output path. The SAC agent internally transforms the unbounded Gaussian distribution to the bounded distribution to compute the probability density function and entropy properly'
However, the behaviour where a tanhLayer always output -1 or 1 (in the case of the first action) isn't very logical to me. Do I have to add the tanhLayer and Scalinglayer manually for this to work correctly? Is there any reason why it is only fluctuating between -1 and 1 without exploring other actions in between?
1 个评论
Takeshi Takahashi
2021-4-9
Adding a tanh layer and a scaling layer to the mean path is unnecessary since the SAC agent applies tanh and scaling internally based on the action spec.
The first action range is much larger than the second action, which might cause the exploration issue. The standard deviation from the network for the first action is probably too big.
I suggest the following:
回答(1 个)
Sampson Nwachukwu
2023-1-10
Hi,
I am facing a similar challenge.
I have an action space specified as:
numActions = 1;
actionInfo = rlNumericSpec([numActions 1],...
"LowerLimit",-0.1, "UpperLimit",0.1);
Setting the TargetEntropy = -3 or -5 gives a better training curve; although I do not achieve an optimal result. However, when I set it to -1 or allow the program to choose authomatically, I end up getting a very bad training curve with a poor result. I have tried it different temperature coefficient, but I am still getting the same result.
Please, you assistance will be appreciated. Thank you.
1 个评论
Sampson Nwachukwu
2023-1-10
In addition to the question above, is there a way to set the temperature coeffient of SAC automatically on Matlab?
Thank you.
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!