visionTransformer
Syntax
Description
[
returns a base-sized ViT neural network (86.8 million parameters) with a patch size of 16.
The network is fine tuned using the ImageNet 2012 data set at a resolution of
384-by-384.net
,classNames
] = visionTransformer
This feature requires a Deep Learning Toolbox™ license and the Computer Vision Toolbox™ Model for Vision Transformer Network support package. You can download this support package from the Add-On Explorer. For more information, see Get and Manage Add-Ons.
[
returns the ViT neural network with the specified model name.net
,classNames
] = visionTransformer(modelName
)
[
specifies additional options using one or more name-value arguments.net
,classNames
] = visionTransformer(___,Name=Value
)
Examples
Input Arguments
Output Arguments
References
[1] Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani et al. "An Image is Worth 16x16 words: Transformers for Image Recognition at Scale." Preprint, submitted June 3, 2021. https://doi.org/10.48550/arXiv.2010.11929.
[2] Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting." The Journal of Machine Learning Research 15, no. 1 (January 1, 2014): 1929–58
[3] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet Classification with Deep Convolutional Neural Networks." Communications of the ACM 60, no. 6 (May 24, 2017): 84–90. https://doi.org/10.1145/3065386.
Extended Capabilities
Version History
Introduced in R2023b
See Also
patchEmbeddingLayer
| trainnet
(Deep Learning Toolbox) | trainingOptions
(Deep Learning Toolbox) | dlnetwork
(Deep Learning Toolbox)
Topics
- Train Vision Transformer Network for Image Classification
- Deep Learning in MATLAB (Deep Learning Toolbox)
- List of Deep Learning Layers (Deep Learning Toolbox)
- Deep Learning Tips and Tricks (Deep Learning Toolbox)
- Data Sets for Deep Learning (Deep Learning Toolbox)