Issue with StepNorm Going to Zero on RTX 4060 Ti During MLP Training

4 次查看（过去 30 天）

Keonwook Kim 2025-3-11

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2175020-issue-with-stepnorm-going-to-zero-on-rtx-4060-ti-during-mlp-training

I am using the "trainnet" function to train a relatively shallow MLP network. To accelerate processing, I use two different GPUs: an RTX 2070 Super and an RTX 4060 Ti.

The RTX 2070 Super produces the expected output for all iterations. However, the RTX 4060 Ti quickly terminates the training process because the StepNorm value approaches zero. I suspect this issue is related to the memory bandwidth difference—256-bit for the RTX 2070 Super versus 128-bit for the RTX 4060 Ti—which might affect numerical precision during parallel computations.

When I checked the SingleDoubleRatio, I found that:

RTX 2070 Super: 32
RTX 4060 Ti: 64

According to the MATLAB documentation, a SingleDoubleRatio of 32 indicates more double-precision computation, while 64 indicates less. I attempted to manually enforce precision control for the GPU but was unsuccessful.

How can I resolve this issue and ensure stable training on the RTX 4060 Ti? Any insights would be greatly appreciated.

Thank you!

在 Help Center 和 File Exchange 中查找有关 Pattern Recognition and Classification 的更多信息

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Issue with StepNorm Going to Zero on RTX 4060 Ti During MLP Training

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

Issue with StepNorm Going to Zero on RTX 4060 Ti During MLP Training

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论