Issue with StepNorm Going to Zero on RTX 4060 Ti During MLP Training

4 次查看(过去 30 天)
I am using the "trainnet" function to train a relatively shallow MLP network. To accelerate processing, I use two different GPUs: an RTX 2070 Super and an RTX 4060 Ti.
The RTX 2070 Super produces the expected output for all iterations. However, the RTX 4060 Ti quickly terminates the training process because the StepNorm value approaches zero. I suspect this issue is related to the memory bandwidth difference—256-bit for the RTX 2070 Super versus 128-bit for the RTX 4060 Ti—which might affect numerical precision during parallel computations.
When I checked the SingleDoubleRatio, I found that:
  • RTX 2070 Super: 32
  • RTX 4060 Ti: 64
According to the MATLAB documentation, a SingleDoubleRatio of 32 indicates more double-precision computation, while 64 indicates less. I attempted to manually enforce precision control for the GPU but was unsuccessful.
How can I resolve this issue and ensure stable training on the RTX 4060 Ti? Any insights would be greatly appreciated.
Thank you!

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Pattern Recognition and Classification 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by