It is being said that Resnet model requires less training time but when I used resnetLayer function of matLab to create a residual network why it takes more time

Question

debojit sharma 2022-7-15

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1760645-it-is-being-said-that-resnet-model-requires-less-training-time-but-when-i-used-resnetlayer-function

回答： Hari 2023-9-15

It is being said that Resnet model requires less training time as it eliminate vanishing gradient problem but when I used resnetLayer function of matLab to create a residual network and do the training it takes more time in comparison to CNN-LSTM model why it is so?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Hari 2023-9-15

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1760645-it-is-being-said-that-resnet-model-requires-less-training-time-but-when-i-used-resnetlayer-function#answer_1311346

Hi Debojit,

I understand that you have observed, the “ResNet” model is taking more time to train compared to the “CNN-LSTM” model, contrary to the expectation that “ResNet” should have faster training due to its ability to address the vanishing gradient problem.

The “ResNet” model is known for its ability to mitigate the vanishing gradient problem, which can occur in deep neural networks during training.

However, the actual training time of a model can be influenced by various factors, including the specific architecture, dataset, hyperparameters, and implementation details. It's important to note that the “ResNet” architecture itself does not guarantee faster training time in all scenarios compared to other models like “CNN-LSTM”.

Here are a few reasons why you might observe longer training time with the “ResNet” model compared to the CNN-LSTM model in your specific case:

Model complexity: “ResNet” models can have a larger number of parameters compared to CNN-LSTM models, especially if you use deeper “ResNet” variants like ResNet-50 or ResNet-101. This increased complexity may require more computational resources and training time.
Dataset characteristics: The characteristics of your dataset, such as size, complexity, and class imbalance, can affect training time. If your dataset is particularly large or contains complex patterns, it may require more time to train regardless of the model architecture.
Hyperparameters: The choice of hyperparameters, such as learning rate, batch size, and regularization techniques, can impact training time. Suboptimal hyperparameter settings may result in slower convergence or require more iterations to achieve good performance.
Implementation details: The efficiency of the implementation, including the software framework and hardware used, can affect training time. Different frameworks or hardware configurations may have varying levels of optimization, which can influence the overall training speed.

Refer to the documentation of “Sequence Classification Using CNN-LSTM Network” for more information.

Sequence Classification Using CNN-LSTM Network - MATLAB & Simulink (mathworks.com)

Refer to the documentation of “resnetLayers” for more information.

Create 2-D residual network - MATLAB resnetLayers (mathworks.com)

I hope this helps.

Thanks,

Hari.