Deep Learning: higher training loss using GPU. Why?

Question

EK_47 2022-9-25

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1811545-deep-learning-higher-training-loss-using-gpu-why

回答： Piyush Dubey 2023-9-6

Hi,

I am training a ResNet-50 network for object detction using about 3,000 images. I have tried it in two ways, using CPU and GPU.

1 - CPU: I used Intel Xeon Processor E5-2687W v3 (10 cores); it took 70 hours; training and validation losses at epoch 40 were 0.0532 and 0.0004.

2- GPU: I used NVIDIA GeForce RTX 3070 Ti 8GB; it took 6 hours; training and validation losses at epoch 40 were 0.0764 and 0.0013.

As you can see, using GPU it takes much less time to train the model, but the training loss is higher. Also, the model trained on GPU gives poorer performance in predicting unseen data.

Why is this? How can I get the same accuracy on GPU?

Thanks

2 个评论
显示无隐藏无

Walter Roberson 2022-9-25

On the GPU, is it being trained in single precision or in double precision ?

EK_47 2022-9-25

I do not know, but I think it is in single precision. I read somewhere that it’s not possible to change the default value which is single precision for deep learning in Matlab?

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Piyush Dubey 2023-9-6

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1811545-deep-learning-higher-training-loss-using-gpu-why#answer_1303086

Hi @EK_47,

I understand that you are training a "ResNet-50" model for object detection using CPU and GPU. You have noticed that training on the GPU is faster but it results in higher training and validation losses compared to training on the CPU.

I would like to clarify that the training process generates training and validation datasets randomly. As a result, there will be slight variations in the training and validation losses each time you perform the training. Therefore, the differences in losses between CPU and GPU training may not be significant. Averaging the training and validation losses obtained over multiple training sessions with random data sets from training and validation dataset would serve as a better parameter for comparison of performance between CPU and GPU. This average for both CPU and GPU will turn out to be roughly equal.

To address this issue, you can consider applying specific seeding techniques to ensure that the training and validation datasets remain the same across multiple training sessions. By avoiding random seeding, you can achieve more consistent results and compare the losses between CPU and GPU training more effectively.

Additionally, you can use cross-validation techniques to compare the losses of the network trained on CPU and GPU. Cross-validation involves splitting the dataset into multiple subsets and performing training and validation on different combinations of these subsets. This can help provide a more reliable comparison of the performance between CPU and GPU training.

For more information on cross-validation techniques, I recommend referring to the following MathWorks documentation link:

https://www.mathworks.com/discovery/cross-validation.html

By applying seeding techniques and utilizing cross-validation, you can obtain more reliable and comparable results for training the ResNet-50 model on both CPU and GPU.

Hope this helps.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Deep Learning: higher training loss using GPU. Why?

2 个评论
显示无隐藏无

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Deep Learning: higher training loss using GPU. Why?

2 个评论 显示 无隐藏 无

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

2 个评论
显示无隐藏无

0 个评论
显示 -2更早的评论隐藏 -2更早的评论