How does the L2 Regularization in a custom training loop work?
6 次查看(过去 30 天)
显示 更早的评论
Hi,
I implemented the custom training loop to train a sequence to sequence regression model. I also implemented the L2 regularization as described in the documentation here: https://de.mathworks.com/help/deeplearning/ug/specify-training-options-in-custom-training-loop.html#mw_50581933-e0ce-4670-9456-af23b2b6f337
Now I'm wondering how this works. If I have a look in other documentations like this one from Google, it seems to work differently. Google describes it as adding the square of the weights to the loss. In Matlab it looks like I add the weights to the gradients. Isn't that something different? Is one way better than the other?
Cheers
0 个评论
采纳的回答
Richard
2024-9-26
Short story: these are the same.
Long story: The link you gave explains the underlying mathematics of L2 regularization and its effects on weights. However it doesn't really explain the mechanics of how adding a regularization to the loss affects the minimization algorithm - it stops at saying you "minimize(loss + lambda*complexity)".
In this case the minimization is being done by taking steps based on the gradient of the total loss with respect to each weight. For each weight parameter the d(loss + lambda*complexity)/dw that is required is equal to dL/dw + d(lambda*complexity)/dw, i.e. the gradients are the sum of the non-regularized gradient and a regularization term. It is this sum which the MATLAB example is performing.
0 个评论
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Image Data Workflows 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!