Get Started with Object Detection Using Deep Learning

Object detection using deep learning provides a fast and accurate means to predict the location of an object in an image. Deep learning is a powerful machine learning technique in which the object detector automatically learns image features required for detection tasks. Computer Vision Toolbox™ offers several techniques for object detection using deep learning, such as you only look once (YOLO) v2, YOLO v3, YOLO v4, YOLOX, RTMDet, and single shot detection (SSD).

Object detection enables you to localize and categorize objects within image data.

Applications that use object detection include:

Scene understanding
Multi-object tracking
Visual inspection
Self-driving vehicles
Surveillance

Computer Vision Toolbox and its support packages enable you to configure a pretrained object detection or design a custom object detection network, perform inference using a pretrained or trained network, and perform transfer learning on a custom data set.

To get started with using a pretrained network to detect objects in an image, see the Detect Objects Using Pretrained Object Detection Network section.
To get started with training an untrained or pretrained object detection network for transfer learning, see the Train Object Detection Network and Perform Transfer Learning section.

You can also design a custom network layer-by-layer using the Deep Network Designer (Deep Learning Toolbox) app. For an example using the YOLO v2 object detection network, see Perform Transfer Learning Using Pretrained YOLO v2 Detector.

Detect Objects Using Pretrained Object Detection Network

Computer Vision Toolbox provides pretrained object detection models that you can use to perform out-of-the-box inference or transfer learning on a custom data set.

Configure Pretrained Model

To use a pretrained object detection model, you must first download and install the pretrained object detection model. You can download and install a pretrained model support package using the Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons.

This table lists the names of the object detector objects, the corresponding available pretrained models, and the names of the corresponding add-on support packages to download.

Object Detection Model	Available Pretrained Models	Name of Support Package
`yolov2ObjectDetector`	`darknet19-coco` `tiny-yolov2-coco`	Computer Vision Toolbox Model for YOLO v2 Object Detection
`yolov3ObjectDetector`	`darknet53-coco` `tiny-yolov3-coco`	Computer Vision Toolbox Model for YOLO v3 Object Detection
`yolov4ObjectDetector`	`csp-darknet53-coco` `tiny-yolov4-coco`	Computer Vision Toolbox Model for YOLO v4 Object Detection
`yoloxObjectDetector`	`nano-coco` `tiny-coco` `small-coco` `medium-coco` `large-coco`	Automated Visual Inspection Library for Computer Vision Toolbox
`rtmdetObjectDetector`	`tiny-network-coco` `small-network-coco` `medium-network-coco` `large-network-coco`	Computer Vision Toolbox Model for RTMDet Object Detection

Perform Inference Using Pretrained Model

Perform inference and detect objects in a test image using a pretrained detector model. For help selecting a pretrained object detection network for your application, see Choose an Object Detector. To return bounding boxes, confidence scores, and corresponding class labels, pass the pretrained detector object to the corresponding detect object function.

For example, to use the pretrained YOLO v4 tiny-yolov4-coco network listed in the Configure Pretrained Model section, load the model by creating a yolov4ObjectDetector object.

detector = yolov4ObjectDetector("tiny-yolov4-coco");

Detect objects in a test image, I, by using the detect object function of the yolov4ObjectDetector object.

I = imread("carsonroad.png");
[bboxes,scores,labels] = detect(detector,I);

Display the results overlaid on the input image by using the insertObjectAnnotation function.

detectedImg = insertObjectAnnotation(I,"Rectangle",bboxes,labels);
figure
imshow(detectedImg)

You can detect the objects in a test image, such as cars, using a pretrained network, such as Tiny YOLO v4 COCO network.

To perform inference on a test image using a trained object detection network, use the same process but specify the trained network to the detect function as the detector argument.

MathWorks GitHub Pretrained Networks

The MathWorks^® GitHub repository provides implementations of the latest pretrained object detection deep learning networks to download and use to perform out-of-the-box inference. The pretrained object detection networks have already been trained on standard data sets, such as the COCO and Pascal VOC data sets. You can use these pretrained models directly to detect different objects in a test image.

For a list of all the latest MathWorks pretrained object detectors, see MATLAB Deep Learning (GitHub).

Train Object Detection Network and Perform Transfer Learning

To modify a network to detect additional classes, or to customize other network parameters, you can perform transfer learning. This section shows how to prepare your training data, configure the object detection network, and train the network to perform transfer learning.

Create Training Data

Use a labeling app to interactively label ground truth data in a video, image sequence, image collection, or custom data source. You can label object detection ground truth using rectangle labels, which define the position and size of the object in the image.

You can interactively label ground truth data in images using the Image Labeler App.

To learn more about labeling images for object detection, see these topics:

Augment and Preprocess Data

Use data augmentation to train the object detector on a limited data set. By altering the data set images in minor ways, such as translating, cropping, or transforming, you can create distinct and unique training data, creating a more robust detector. Use datastores to conveniently read and augment collections of data. Use imageDatastore and the boxLabelDatastore to create datastores for images and labeled bounding box data, respectively.

To learn more about augmenting and pre-processing data for training, see these topics:

For more information about augmenting training data using datastores, see Datastores for Deep Learning (Deep Learning Toolbox) and Perform Additional Image Processing Operations Using Built-In Datastores (Deep Learning Toolbox).

Train Object Detector

To train the object detection network, use a training function that corresponds to your object detection model. For example, use the trainYOLOv4ObjectDetector function if you are using the yolov4ObjectDetector object to configure the detector.

Specify the network training options using the trainingOptions (Deep Learning Toolbox) function. You can determine training options parameters using the Experiment Manager (Deep Learning Toolbox) app. For more information on using Experiment Manager for hyperparameter tuning, see Train Object Detectors in Experiment Manager.

To learn more about training, inference, and evaluating your results, see these examples:

Evaluate and Fine-tune Object Detector Performance

To evaluate the training results against the ground truth with a comprehensive set of metrics, use the evaluateObjectDetection function. The function returns the object detection metrics as an objectDetectionMetrics object. Use these objectDetectionMetrics object functions to evaluate metrics across all, or a selection of, classes and overlap thresholds.

`objectDetectionMetrics` Object Function	Usage
`averagePrecision`	Compute average precision (AP) for all or selected classes and overlap (intersection-over-union) thresholds in your data set
`precisionRecall`	Compute precision, recall, and confidence scores for all classes in the data set, or for specified classes and overlap thresholds
`confusionMatrix`	Compute the confusion matrix and normalized confusion matrix at specified confidence score threshold or overlap threshold values
`summarize`	Compute the summary of the object detection metrics over the entire data set, or over each class

For an example that shows how to use object detection metrics to evaluate and fine-tune an object detector, see the Multiclass Object Detection Using YOLO v2 Deep Learning example. This image shows a sample precision-recall (PR) plot, and the recall and precision plots as a function of confidence score, for selected classes in a data set.

This image shows the precision-recall plots for selected classes, at a single overlap threshold, which you can use to determine the optimal detection threshold.