Main Content

trainCascadeObjectDetector

Train cascade object detector model

Description

trainCascadeObjectDetector(outputXMLFilename,positiveInstances,negativeImages) writes a trained cascade detector XML file named, outputXMLFilename. The file name must include an XML extension. For a more detailed explanation on how this function works, refer to Get Started with Cascade Object Detector.

example

trainCascadeObjectDetector(outputXMLFilename,'resume') resumes an interrupted training session. The outputXMLFilename input must match the output file name from the interrupted session. All arguments saved from the earlier session are reused automatically.

trainCascadeObjectDetector(___,Name=Value) specifies options using one or more name-value arguments in addition to any combination of arguments from previous syntaxes. For example, ObjectTrainingSize=[100,100] sets the height and width of objects during training.

example

Examples

collapse all

Load the positive samples data from a MAT file. The file contains the ground truth, specified as table of bounding boxes for several object categories. The grount truth was labeled and exported from the Image Labeler app.

load("stopSignsAndCars.mat");

Prefix the fullpath to the stop sign images.

stopSigns = fullfile(toolboxdir("vision"),"visiondata",stopSignsAndCars{:,1});

Create datastores to load the ground truth data for stop signs.

imds = imageDatastore(stopSigns);
blds = boxLabelDatastore(stopSignsAndCars(:,2));

Combine the image and box label datastores.

positiveInstances = combine(imds,blds);

Add the image folder path to the MATLAB path.

imDir = fullfile(matlabroot,"toolbox","vision","visiondata","stopSignImages");
addpath(imDir);

Specify a folder for negative images.

negativeFolder = fullfile(matlabroot,"toolbox","vision","visiondata","nonStopSigns");

Create an imageDatastore object containing negative images.

negativeImages = imageDatastore(negativeFolder);

Train a cascade object detector called "stopSignDetector.xml" using HOG features. NOTE: The command can take a few minutes to run.

trainCascadeObjectDetector("stopSignDetector.xml",positiveInstances,negativeFolder,FalseAlarmRate=0.01,NumCascadeStages=3);
Automatically setting ObjectTrainingSize to [35, 32]
Using at most 42 of 42 positive samples per stage
Using at most 84 negative samples per stage

--cascadeParams--
Training stage 1 of 3
[........................................................................]
Used 42 positive and 84 negative samples
Time to train stage 1: 0 seconds

Training stage 2 of 3
[........................................................................]
Used 42 positive and 84 negative samples
Time to train stage 2: 1 seconds

Training stage 3 of 3
[........................................................................]
Used 42 positive and 84 negative samples
Time to train stage 3: 1 seconds

Training complete

Use the newly trained classifier to detect a stop sign in an image.

detector = vision.CascadeObjectDetector("stopSignDetector.xml");

Read the test image.

img = imread("stopSignTest.jpg");

Detect a stop sign in the test image.

bbox = step(detector,img);

Insert bounding box rectangles and return the marked image.

detectedImg = insertObjectAnnotation(img,"rectangle",bbox,"stop sign");

Display the detected stop sign.

figure
imshow(detectedImg)

Figure contains an axes object. The hidden axes object contains an object of type image.

Remove the image folder from the path.

rmpath(imDir);

Input Arguments

collapse all

Positive samples, specified as a datastore or a two-column table.

  • If you use a datastore, your data must be set up so that calling the datastore with the read and readall functions returns a cell array or table with at least two columns. The table describes the data contained in the columns:

    Imagesboxeslabels (optional)

    Cell vector of grayscale or RGB images.

    M-by-4 matrices of bounding boxes of the form [x, y, width, height], where [x,y] represent the top-left coordinates of the bounding box.

    Cell array that contains an M-element categorical vector containing object class names. All categorical data returned by the datastore must contain the same categories.

    When you provide this data, the function uses the class label to fill the ClassificationModel property of the trained detector, specified as a vision.CascadeObjectDetector object. Otherwise, the class labels are not required for training because the cascade object detector is a single class detector.

  • If you use a table, the table must have two or more columns. The first column of the table must contain image file names with paths. The images must be grayscale or truecolor (RGB) and they can be in any format supported by imread. Each of the remaining columns must be a cell vector that contains M-by-4 matrices that represent a single object class, such as vehicle, flower, or stop sign. The columns contain 4-element double arrays of M bounding boxes in the format [x,y,width,height]. The format specifies the upper-left corner location and size of the bounding box in the corresponding image. To create a ground truth table, you can use the Image Labeler app or Video Labeler app. To create a table of training data from the generated ground truth, use the objectDetectorTrainingData function.

Negative images, specified as an ImageDatastore object, a path to a folder containing images, or as a cell array of image file names. Because the images are used to generate negative samples, they must not contain any objects of interest. Instead, they should contain backgrounds associated with the object.

Trained cascade detector file name, specified as a character vector or a string scalar with an XML extension. For example, 'stopSignDetector.xml'.

Data Types: char

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: FeatureType='Haar' specifies Haar for the type of features to use.

Training object size, specified as either a two-element [height, width] vector or as 'Auto'. Before training, the function resizes the positive and negative samples to ObjectTrainingSize in pixels. If you select 'Auto', the function determines the size automatically based on the median width-to-height ratio of the positive instances. For optimal detection accuracy, specify an object training size close to the expected size of the object in the image. However, for faster training and detection, set the object training size to be smaller than the expected size of the object in the image.

Data Types: char | single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Negative sample factor, specified as a real-valued scalar. The number of negative samples to use at each stage is equal to

NegativeSamplesFactor × [the number of positive samples used at each stage].

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Number of cascade stages to train, specified as a positive integer. Increasing the number of stages may result in a more accurate detector but also increases training time. More stages can require more training images, because at each stage, some number of positive and negative samples are eliminated. This value depends on the values of FalseAlarmRate and TruePositiveRate. More stages can also enable you to increase the FalseAlarmRate. See the Get Started with Cascade Object Detector tutorial for more details.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Acceptable false alarm rate at each stage, specified as a value in the range (0 1]. The false alarm rate is the fraction of negative training samples incorrectly classified as positive samples.

The overall false alarm rate is calculated using the FalseAlarmRate per stage and the number of cascade stages, NumCascadeStages:

FalseAlarmRateNumCascadeStages

Lower values for FalseAlarmRate increase complexity of each stage. Increased complexity can achieve fewer false detections but can result in longer training and detection times. Higher values for FalseAlarmRate can require a greater number of cascade stages to achieve reasonable detection accuracy.

Data Types: single | double

Minimum true positive rate required at each stage, specified as a value in the range (0 1]. The true positive rate is the fraction of correctly classified positive training samples.

The overall resulting target positive rate is calculated using the TruePositiveRate per stage and the number of cascade stages, NumCascadeStages:

TruePositiveRateNumCascadeStages

Higher values for TruePositiveRate increase complexity of each stage. Increased complexity can achieve a greater number of correct detections but can result in longer training and detection times.

Data Types: single | double

Feature type, specified as one of the following:

'Haar'[1] — Haar-like features
'LBP'[2] — Local binary patterns
'HOG'[3] — Histogram of oriented gradients

The function allocates a large amount of memory, especially the Haar features. To avoid running out of memory, use this function on a 64-bit operating system with a sufficient amount of RAM.

Data Types: char

Tips

  • Training a good detector requires thousands of training samples. Processing time for a large amount of data varies, but it is likely to take hours or even days. During training, the function displays the time it took to train each stage in the MATLAB® command window.

  • The OpenCV HOG parameters used in this function are:

    • Numbins: 9

    • CellSize = [8 8]

    • BlockSize = [4 4]

    • BlockOverlap = [2 2]

    • UseSignedOrientation = false

References

[1] Viola, P., and M. Jones. “Rapid Object Detection using a Boosted Cascade of Simple Features.” Proceedings of the 2001 IEEE Computer Society Conference. CVPR 2001, 1:I-511-I–518. Kauai, HI, USA: IEEE Comput. Soc, 2001.

[2] Ojala, T., M. Pietikainen, and T. Maenpaa. “Multiresolution Gray-scale and Rotation Invariant Texture Classification With Local Binary Patterns.” In IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 7: 971–87, 2002. DOI.org (Crossref), https://doi.org/10.1109/TPAMI.2002.1017623.

[3] Dalal, N., and B. Triggs. “Histograms of Oriented Gradients for Human Detection.” In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), 1:886–93. San Diego, CA, USA: IEEE, 2005. DOI.org (Crossref), https://doi.org/10.1109/CVPR.2005.177.

[4] Lienhart, R., Kuranov, A., Pisarevsky, V.. “Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapid Object Detection” DAGM 2003. Lecture Notes in Computer Science. 2781:297-304. Springer, 2003. DOI.org (Crossref), https://doi.org/10.1007/978-3-540-45243-0_39.

Version History

Introduced in R2013a