Object Detection

Perform classification, object detection, transfer learning using convolutional neural networks (CNNs, or ConvNets), create customized detectors

Object detection is a computer vision technique for locating instances of objects in images or videos. Object detection algorithms typically leverage machine learning or deep learning to produce meaningful results. When looking at images or video, humans can recognize and locate objects of interest in a matter of moments. The goal of object detection is to replicate this intelligence using a computer. The best approach for object detection depends on your application and the problem you are trying to solve.

Deep learning techniques require a large number of labeled training images, so the use of a GPU is recommended to decrease the time needed to train a model. Deep learning-based approaches to object detection use convolutional neural networks (CNNs or ConvNets), such as YOLO, or use single-shot detection (SSD). You can train a custom object detector, or use a pretrained object detector by leveraging transfer learning, an approach that enables you to start with a pretrained network and then fine-tune it for your application. Convolutional neural networks require Deep Learning Toolbox™. Training and prediction are supported on a CUDA^®-capable GPU. Use of a GPU is recommended and requires Parallel Computing Toolbox™. For more information, see Computer Vision Toolbox Preferences and Parallel Computing Support in MathWorks Products (Parallel Computing Toolbox).

Machine learning techniques for object detection include aggregate channel features (ACF), support vector machines (SVM) classification using histograms of oriented gradient (HOG) features, and the Viola-Jones algorithm for human face or upper-body detection. You can choose to start with a pretrained object detector or create a custom object detector to suit your application.

Labeled boats, neural network, and person detector

Apps

Image Labeler	Label images for computer vision applications
Video Labeler	Label video for computer vision applications

Functions

expand all

Detect Objects

Deep Learning Detectors

`rtmdetObjectDetector`	Detect objects using RTMDet object detector (Since R2024b)
`ssdObjectDetector`	Detect objects using SSD deep learning detector (Since R2020a)
`yolov2ObjectDetector`	Detect objects using YOLO v2 object detector
`yolov3ObjectDetector`	Detect objects using YOLO v3 object detector (Since R2021a)
`yolov4ObjectDetector`	Detect objects using YOLO v4 object detector (Since R2022a)
`yoloxObjectDetector`	Detect objects using YOLOX object detector (Since R2023b)
`peopleDetector`	Detect people using pretrained deep learning object detector (Since R2024b)

Feature-based Detectors

`readAprilTag`	Detect and estimate pose for AprilTag in image (Since R2020b)
`readArucoMarker`	Detect and estimate pose for ArUco marker in image (Since R2024a)
`generateArucoMarker`	Generate ArUco marker images (Since R2024a)
`readBarcode`	Detect and decode 1-D or 2-D barcode in image (Since R2020a)
`acfObjectDetector`	Detect objects using aggregate channel features
`peopleDetectorACF`	Detect people using aggregate channel features
`vision.CascadeObjectDetector`	Detect objects using the Viola-Jones algorithm
`vision.ForegroundDetector`	Foreground detection using Gaussian mixture models
`vision.BlobAnalysis`	Properties of connected regions

Detect Objects Using Point Features

`detectBRISKFeatures`	Detect BRISK features
`detectFASTFeatures`	Detect corners using FAST algorithm
`detectHarrisFeatures`	Detect corners using Harris–Stephens algorithm
`detectKAZEFeatures`	Detect KAZE features
`detectMinEigenFeatures`	Detect corners using minimum eigenvalue algorithm
`detectMSERFeatures`	Detect MSER features
`detectORBFeatures`	Detect ORB keypoints
`detectSIFTFeatures`	Detect scale invariant feature transform (SIFT) features (Since R2021b)
`detectSURFFeatures`	Detect SURF features
`extractFeatures`	Extract interest point descriptors
`matchFeatures`	Find matching features

Select Detected Objects

`selectStrongestBbox`	Select strongest bounding boxes from overlapping clusters using nonmaximal suppression (NMS)
`selectStrongestBboxMulticlass`	Select strongest multiclass bounding boxes from overlapping clusters using nonmaximal suppression (NMS)

Train Custom Object Detectors

Load Training Data

`boxLabelDatastore`	Datastore for bounding box label data
`groundTruth`	Ground truth label data
`imageDatastore`	Datastore for image data
`objectDetectorTrainingData`	Create training data for an object detector
`combine`	Combine data from multiple datastores

Train Feature-Based Object Detectors

`trainACFObjectDetector`	Train ACF object detector
`trainCascadeObjectDetector`	Train cascade object detector model
`trainImageCategoryClassifier`	Train an image category classifier

Train Deep Learning Based Object Detectors

`trainSSDObjectDetector`	Train SSD deep learning object detector (Since R2020a)
`trainYOLOv2ObjectDetector`	Train YOLO v2 object detector
`trainYOLOv3ObjectDetector`	Train YOLO v3 object detector (Since R2024a)
`trainYOLOv4ObjectDetector`	Train YOLO v4 object detector (Since R2022a)
`trainYOLOXObjectDetector`	Train YOLOX object detector (Since R2023b)

Augment and Preprocess Training Data for Deep Learning

`balanceBoxLabels`	Balance bounding box labels for object detection (Since R2020a)
`bboxcrop`	Crop bounding boxes
`bboxerase`	Remove bounding boxes (Since R2021a)
`bboxresize`	Resize bounding boxes
`bboxwarp`	Apply geometric transformation to bounding boxes
`bbox2points`	Convert rectangle to corner points list
`imwarp`	Apply geometric transformation to image
`imcrop`	Crop image
`imresize`	Resize image
`randomAffine2d`	Create randomized 2-D affine transformation
`centerCropWindow2d`	Create rectangular center cropping window
`randomWindow2d`	Randomly select rectangular region in image (Since R2021a)
`integralImage`	Calculate 2-D integral image

Design Object Detection Deep Neural Networks

R-CNN (Regions With Convolutional Neural Networks)

`roiAlignLayer`	Non-quantized ROI pooling layer for Mask-CNN (Since R2020b)
`roiMaxPooling2dLayer`	Neural network layer used to output fixed-size feature maps for rectangular ROIs
`roialign`	Non-quantized ROI pooling of `dlarray` data (Since R2021b)

YOLO v2 (You Only Look Once version 2)

`yolov2TransformLayer`	Create transform layer for YOLO v2 object detection network
`spaceToDepthLayer`	Space to depth layer (Since R2020b)

Focal Loss

focalCrossEntropy Compute focal cross-entropy loss (Since R2020b)

SSD (Single Shot Detector)

ssdMergeLayer Create SSD merge layer for object detection (Since R2020a)

Anchor Boxes

estimateAnchorBoxes Estimate anchor boxes for deep learning object detectors

Visualize Detection Results

`cuboid2img`	Project cuboids from 3-D world coordinates to 2-D image coordinates (Since R2022b)
`insertObjectAnnotation`	Annotate truecolor or grayscale image or video
`insertObjectMask`	Insert masks in image or video stream (Since R2020b)
`insertShape`	Insert shapes in image or video
`showShape`	Display shapes on image, video, or point cloud (Since R2020b)

Evaluate Predicted Results

`evaluateObjectDetection`	Evaluate object detection data set against ground truth (Since R2023b)
`objectDetectionMetrics`	Object detection quality metrics (Since R2023b)
`mAPObjectDetectionMetric`	Mean average precision (mAP) metric for object detection (Since R2024a)
`bboxOverlapRatio`	Compute bounding box overlap ratio
`bboxPrecisionRecall`	Compute bounding box precision and recall against ground truth

Blocks

Deep Learning Object Detector

Detect objects using trained deep learning object detector (Since R2021b)

Topics

Get Started

Get Started with Object Detection Using Deep Learning
Perform object detection using deep learning neural networks such as YOLOX, YOLO v4, and SSD.
Choose an Object Detector
Compare object detection deep learning models, such as YOLOX, YOLO v4, RTMDet, and SSD.
Local Feature Detection and Extraction
Learn the benefits and applications of local feature detection and extraction.
Get Started with Cascade Object Detector
Train a custom classifier.
Point Feature Types
Choose functions that return and accept points objects for several types of features.
Getting Started with OCR
Detect and recognize text in multiple languages, train OCR models to recognize custom text.
Image Classification with Bag of Visual Words
Use the Computer Vision Toolbox™ functions for image category classification by creating a bag of visual words.

Training Data for Object Detection and Instance Segmentation

Get Started with the Image Labeler
Interactively label rectangular ROIs for object detection, pixels for semantic segmentation, polygons for instance segmentation, and scenes for image classification.
Get Started with the Video Labeler
Interactively label rectangular ROIs for object detection, pixels for semantic segmentation, polygons for instance segmentation, and scenes for image classification in a video or image sequence.
Datastores for Deep Learning (Deep Learning Toolbox)
Learn how to use datastores in deep learning applications.
Training Data for Object Detection and Semantic Segmentation
Create training data for object detection or semantic segmentation using the Image Labeler or Video Labeler.
Get Started with Image Preprocessing and Augmentation for Deep Learning
Preprocess data for deep learning applications with deterministic operations such as resizing, or augment training data with randomized operations such as random cropping.

Get Started With Deep Learning

Deep Learning in MATLAB (Deep Learning Toolbox)
Discover deep learning capabilities in MATLAB^® using convolutional neural networks for classification and regression, including pretrained networks and transfer learning, and training on GPUs, CPUs, clusters, and clouds.
Pretrained Deep Neural Networks (Deep Learning Toolbox)
Learn how to download and use pretrained convolutional neural networks for classification, transfer learning and feature extraction.

Featured Examples

New

Detect Small Objects Using Tiled Training of YOLOX Network

Detect small objects in full-resolution images using tiled training of a you only look once version X (YOLOX) deep learning network.

Since R2024b
Open Live Script

Object Detection in Large Satellite Imagery Using Deep Learning

Perform object detection on large satellite imagery using deep learning.

Open Live Script

Object Detection Using YOLO v4 Deep Learning

Detect objects in images using you only look once version 4 (YOLO v4) deep learning network. In this example, you will

Open Live Script

Perform 6-DoF Pose Estimation for Bin Picking Using Deep Learning

Perform six degrees-of-freedom (6-DoF) pose estimation by estimating the 3-D position and orientation of machine parts in a bin using RGB-D images and a deep learning network.