Classify Images and Videos
Computer Vision Toolbox™ provides end-to-end workflows for classifying images and videos using deep learning and traditional computer vision techniques. For image category classification, you can use deep learning-based pretrained vision transformer (ViT) and CLIP models, or apply the bag-of-visual-words approach to categorize images based on their visual content. These workflows support applications such as scene recognition, content filtering, and automated tagging. Start by labeling scene-level categories using the Image Labeler and Video Labeler apps, and then train or fine-tune models using your labeled data.
For video classification and activity recognition, the toolbox enables you to classify sequences of frames into action categories such as walking, swimming, or sitting using deep learning models. These capabilities are essential for tasks like human-computer interaction and surveillance. The toolbox supports training, evaluation, and deployment of models that can interpret temporal patterns in video data to recognize complex activities and gestures.
Categories
- Image Category Classification
Classify images using bag-of-features, CNNs, vision transformers and vision-language models
- Video Classification
Classify videos and perform activity recognition using deep learning





