Main Content

Choose an Object Detector

The Computer Vision Toolbox™ provides object detectors to use for finding and classifying objects in an image or video. Train a detector using an object detector function, then use it with machine learning and deep learning to quickly and accurately predict the location of an object in an image.

When choosing a detector, consider whether you need these features::

Application and Performance

  • Single vs Multiple classes — Multiple classes require a variation of different classifiers used at multiple locations and scales on the image or video.

  • Runtime performance — Detectors vary in performance depending on the time it takes to detect objects in an image. A detector trained for a single class, or a detector trained to detect objects that are similar in pose and shape, will have a faster runtime performance than a deep learning detector trained on multiple objects. More importantly, deep learning is slower because it requires more computations than machine learning or feature-based detection approaches.

  • Machine learning — Machine learning uses two types of techniques: supervised learning, which trains a model on known input and output data so that it can predict future outputs, and unsupervised learning, which finds hidden patterns or intrinsic structures in input data. For more details, see Machine Learning in MATLAB (Statistics and Machine Learning Toolbox)

  • Deep learning — Implements deep neural networks with algorithms, pretrained models, and apps. You can use convolutional neural networks to perform classification and regression on images. For more details, see Getting Started with Object Detection Using Deep Learning.


  • C/C++ code generation — SSD, YOLO, ACF, and system object-based detectors support MATLAB® Coder™ C and C++ code generation for a variety of hardware platforms, from desktop systems to embedded hardware. For more details, see MATLAB Coder. The R-CNN-based detectors do not support code generation.

  • GPU code generation — Deep learning-based detectors support GPU code generation with optimized CUDA® by GPU Coder™ for embedded vision, and autonomous systems. For more details, see GPU Coder.

Use the table to view and compare the object detector functions.

Detector Multiple Classes Support Deep Learning Support Code Generation Support GPU SupportExampleDescription
fasterRCNNObjectDetectorYesYesNoYesObject Detection Using Faster R-CNN Deep Learning

  • Requires GPU for optimal performance.

  • Use this detector when you need more precise object localization accuracy.

  • Best performance of the R-CNN family, but slower than YOLO v2 and SSD.

Faster R-CNN is a two-stage network. The second stage refines detection proposals produced by the first stage, which helps improve localization at the cost of runtime performance.

Comparison of R-CNN Object Detectors

fastRCNNObjectDetectorYesYesNoYesTrain Fast R-CNN Stop Sign Detector

  • Consider starting with the fasterRCNNObjectDetector.

  • Requires GPU for optimal performance.

  • Use this detector if you have your own method for producing object regions.

  • Faster than R-CNN, but slower than Faster R-CNN.

Comparison of R-CNN Object Detectors

rcnnObjectDetectorYesYesNoYesTrain Object Detector Using R-CNN Deep Learning

  • Consider starting with the fasterRCNNObjectDetector.

  • Requires GPU for optimal performance.

  • Slowest of the R-CNN-based detectors.

This algorithm combines rectangular region proposals with convolutional neural network features. It is a two-stage detection algorithm. The first stage identifies a subset of regions in an image that might contain an object. The second stage classifies the object in each region.

Comparison of R-CNN Object Detectors

yolov2ObjectDetectorYesYesYesYesObject Detection Using YOLO v2 Deep Learning

  • Consider using SSD or YOLO v3 for better performance across various sizes.

  • Requires GPU for optimal performance.

  • Use this detector when better runtime performance is desired and you have objects that do not drastically vary in size or are small in the image.

  • Better runtime performance compared to Faster R-CNN.

YOLO v2 uses a single stage network to perform object detection.

ssdObjectDetectorYesYesYesYesObject Detection Using SSD Deep Learning

  • Requires GPU for optimal performance.

  • Use this detector when you need to detect objects of various sizes and better runtime performance is desired.

  • Better runtime performance than Faster R-CNN and YOLO v2.

Single shot detector (SSD) uses a single stage detection network to detects objects using multi-scale features.

acfObjectDetectorNoNoYesNoTrain ACF-based Stop Sign Detector

  • A rigid object detector that is suited for single class object detection.

  • Consider using a deep learning object detector if you need to detect multiple object classes or have objects that belong to the same class but are in different configurations or poses.

  • Use this detector when the object you want to detect has similar pose and shape, and when runtime performance is critical.

  • Better runtime performance than deep-learning-based detectors on CPU.

ACF works well for a single class that can be easily classified regardless of pose. For example, it would work well to detect a person, who can be recognized in multiple poses, such as sitting, standing, or riding a horse.

ACF would not work well for detecting vehicles from various viewpoints, such as front, side, and rear.

peopleDetectorACFPretrainedNoYesNoTracking Pedestrians from a Moving Car

Use this pretrained detector to detect upright positioned people.

vision.PeopleDetectorPretrainedNoYesNoDepth Estimation From Stereo VideoUse this pretrained cascade object detector to detect upright positioned people.
vision.CascadeObjectDetectorNoNoYesNoDetect Faces in an Image Using the Frontal Face Classification Model

  • Viola-Jones object detector suitable for rigid object detection. Uses HAAR, HOG, or LBP features.

  • If training a new detector, consider starting with ACF for better performance.

  • Use this detector when a pretrained detector is available for an object class you're interested in detecting, and there is little variation in the object's pose or shape.

Mask R-CNNYesYesNoYesGetting Started with Mask R-CNN for Instance Segmentation

Use this detector when you need to segment individual objects.

YOLO v3YesYesYesYesObject Detection Using YOLO v3 Deep Learning

YOLO v3 is a single stage network that uses multi-scale features to better handle detection of objects of various sizes.

vehicleDetectorACF (Automated Driving Toolbox)PretrainedNoYesNoTrack Multiple Vehicles Using a Camera (Automated Driving Toolbox)Pretrained ACF detector
vehicleDetectorFasterRCNN (Automated Driving Toolbox)PretrainedYesNoYesTrain a Deep Learning Vehicle Detector (Automated Driving Toolbox)Pretrained Faster R-CNN detector
vehicleDetectorYOLOv2 (Automated Driving Toolbox)PretrainedYesYesYesDetect Vehicles Using Monocular Camera and YOLO v2 (Automated Driving Toolbox)Pretrained YOLO v2 detector

See Also




Related Topics