Computer Vision Toolbox Model for Vision Transformer Network
Implementation of several variants of the vision transformer (ViT) model.
909.0 次下载
更新时间
2024/9/11
The Vision Transformer (ViT) model is a pretrained transformer model for image classification. It is also used as a backbone for other computer vision tasks such as object detection. The support package consists of three variants of the ViT model:
- Base-16 model
- Small-16 model
- Tiny-16 model
Here, “base”, “small” and “tiny” represent the model architecture and size, and 16 represents the patch size hyper-parameter. Each variant has been pretrained on ImageNet data set with input resolution of 384 and is stored as a .MAT file.
MATLAB 版本兼容性
创建方式
R2023b
兼容 R2023b 到 R2024b 的版本
平台兼容性
Windows macOS (Apple 芯片) macOS (Intel) Linux标签
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!