splitAnomalyData

Split data into training, validation and testing sets for anomaly detection

Since R2023a

Syntax

[dsTrain,dsVal,dsTest] = splitAnomalyData(ds,anomalyLabels)

[dsTrain,trainLabels,dsVal,valLabels,dsTest,testLabels] = splitAnomalyData(ds,gtLabels,anomalyLabels)

[___] = splitAnomalyData(___,Name=Value)

Description

[dsTrain,dsVal,dsTest] = splitAnomalyData(ds,anomalyLabels) automatically splits the input image datastore, ds, into three datastores for training, validation and testing of anomaly detection networks. anomalyLabels indicates which class labels in gtLabels belong to the anomaly (positive) class. By default, 70% of the total input data is used for training, 10% are used for validation, and 20% are used for testing. By default, the training datastore does not include anomaly images.

Note

This functionality requires the Automated Visual Inspection Library for Computer Vision Toolbox™. You can install the Automated Visual Inspection Library for Computer Vision Toolbox from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons.

example

[dsTrain,trainLabels,dsVal,valLabels,dsTest,testLabels] = splitAnomalyData(ds,gtLabels,anomalyLabels) automatically splits the input datastore, ds, into three datastores and their labels for training, validation and testing of anomaly detection networks. gtLabels represents the ground-truth labels for each corresponding image in the datastore and anomalyLabels determine which labels in gtLabels belong to the anomaly class.

[___] = splitAnomalyData(___,Name=Value) uses name-value arguments to perform custom proportioning of the images. You can specify either the proportion of images in each datastore or the proportion of anomaly and normal images.

If you specify name-value arguments for both proportioning strategies, then splitAnomalyData uses the arguments that specify the proportion of images in each datastore. In this case, the function ignores the arguments that specify the proportion of anomaly and normal images.

Examples

collapse all

Split Datastore of Labeled Anomaly Images

This example uses:

Open Live Script

Load a data set that consists of images of digits from 0 to 9.

dataDir = fullfile(toolboxdir("vision"),"visiondata","digits","synthetic");
ds = imageDatastore(dataDir,IncludeSubfolders=true, ...
    LabelSource="foldernames");

Specify the digits that count as anomalous. For instance, consider images of the digit 8 to be normal, and all other digits to be anomalous.

anomalyLabels = ["0","1","2","3","4","5","6","7","9"];

Split the training data into training, validation, and testing datastores.

[dsTrain,dsVal,dsTest] = splitAnomalyData(ds,anomalyLabels);

Splitting anomaly dataset
-------------------------
* Finalizing... Done.
* Number of files and proportions per class in all the datasets:

               Input                Train               Validation                 Test        
         _________________    _________________    ____________________    ____________________

         NumFiles    Ratio    NumFiles    Ratio    NumFiles     Ratio      NumFiles     Ratio  
         ________    _____    ________    _____    ________    ________    ________    ________
                                                                                               
    0      101        0.1         0         0         34        0.10863       67        0.10686
    1      101        0.1         0         0         34        0.10863       67        0.10686
    2      101        0.1         0         0         34        0.10863       67        0.10686
    3      101        0.1         0         0         34        0.10863       67        0.10686
    4      101        0.1         0         0         33        0.10543       68        0.10845
    5      101        0.1         0         0         34        0.10863       67        0.10686
    6      101        0.1         0         0         33        0.10543       68        0.10845
    7      101        0.1         0         0         34        0.10863       67        0.10686
    8      101        0.1        70         1         10       0.031949       21       0.033493
    9      101        0.1         0         0         33        0.10543       68        0.10845

Input Arguments

collapse all

`ds` — Data set
datastore

Data set of normal and anomaly images, specified as a datastore. If you do not specify the gtLabels argument, then ds must be an image datastore containing labeled images.

`gtLabels` — Ground truth labels
numeric vector | logical vector | categorical vector

Ground truth labels for each image, specified as a numeric vector, logical vector, or categorical vector. The splitAnomalyData function converts the labels into a logical vector according to the set of anomaly labels in anomalyLabels.

`anomalyLabels` — Anomaly labels
numeric vector | logical vector | categorical vector | string vector

Anomaly labels, specified as a vector of the same data type as gtLabels. When gtLabels is categorical, anomalyLabels can be of data type string whose values correspond to categories in gtLabels.

The splitAnomalyData function converts all ground truth labels in gtLabels that belong to the set of anomaly labels to a logical true, indicating an anomaly (positive detection). The function converts all other ground truth labels to a logical false, indicating normality (negative detection).

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: [dsTrain,dsVal,dsTest] = splitAnomalyData(ds,anomalyLabels,DataAllocationRatio=[0.8 0.1 0.1]) assigns 80% of the total input data for training, 10% for validation, and 10% for testing

Specify Proportion of Images in Each Datastore

collapse all

`DataAllocationRatio` — Proportion of input data to include in each datastore
`[0.7 0.1 0.2]` (default) | 1-by-3 numeric vector

Proportion of input data to include in the train, validation and test datastores respectively, specified as a 1-by-3 numeric vector. The function ignores class labels when splitting the data. The elements of the vector must sum to a number between 0 and 1. If the elements sum to a number less than 1, then the splitAnomalyData function does not allocate the remaining data to a training, validation, or testing datastore.

`KeepAnomalyLabelsInTrainingDatastore` — Include anomaly images in training datastore
`false` or `0` (default) | `true` or `1`

Include anomaly images in the training datastore, specified as a numeric or logical false (0) or true (1).

`KeepUnusedAnomalyLabels` — Include all anomaly images in validation and testing datastores
`true` or `1` (default) | `false` or `0`

Include all anomaly images in the validation and testing datastores, specified as a numeric or logical true (1) or false (0). This argument applies only when the value of KeepAnomalyLabelsInTrainingDatastore is false.

By default, KeepUnusedAnomalyLabels is true and the splitAnomalyData function distributes all anomaly images to the validation and testing datastores. Therefore, the validation datastore dsVal and testing datastore dsTest have a higher proportion of anomaly to normal labels than the input datastore ds.

When KeepUnusedAnomalyLabels is false, splitAnomalyData function omits some anomaly files from dsVal and dsTest. Therefore, dsVal and dsTest use the same proportion of anomaly to normal images as ds. Specify KeepUnusedAnomalyLabels as false for stratified partitions.

Specify Proportion of Normal and Anomaly Images

collapse all

`NormalLabelsRatio` — Ratio of files with normal class labels
`[]` (default) | 1-by-3 numeric vector

Ratio of files with normal class labels to include in the training, validation, and testing datastores, specified as a 1-by-3 numeric vector. The elements of the vector must sum to a number between 0 and 1. If the elements sum to a number less than 1, then the splitAnomalyData function does not allocate the remaining normal data to a training, validation, or testing datastore.

Example: NormalLabelsRatio=[0.8 0.1 0.1] assigns 80% of the normal data for training, 10% for validation, and 10% for testing

`AnomalyLabelsRatio` — Ratio of files with anomaly class labels
`[]` (default) | 1-by-3 numeric vector

Ratio of files with normal class labels to include in the training, validation, and testing datastores, specified as a 1-by-3 numeric vector. The elements of the vector must sum to a number between 0 and 1. If the elements sum to a number less than 1, then the splitAnomalyData function does not allocate the remaining anomaly data to a training, validation, or testing datastore.

Example: AnomalyLabelsRatio=[0.1 0.1 0.5] assigns 10% of the anomaly data for training, 10% for validation, and 50% for testing, and omits the remaining 30% of the anomaly data

Other Partitioning Options

collapse all

`Verbose` — Display statistics
`true` or `1` (default) | `false` or `0`

Display statistics of the partitioned data for each class, specified as a numeric or logical true (1) or false (0).

Output Arguments

collapse all

`dsTrain` — Training datastore
datastore

Training datastore, returned as a datastore of the same type as ds.

`dsVal` — Validation datastore
datastore

Validation datastore, returned as a datastore of the same type as ds.

`dsTest` — Testing datastore
datastore

Testing datastore, returned as a datastore of the same type as ds.

`trainLabels` — Labels of data in training datastore
numeric vector | logical vector | categorical vector

Labels of data in training datastore, returned as a numeric vector, logical vector, or categorical vector.

`valLabels` — Labels of data in validation datastore
numeric vector | logical vector | categorical vector

Labels of data in validation datastore, returned as a numeric vector, logical vector, or categorical vector.

`testLabels` — Labels of data in testing datastore
numeric vector | logical vector | categorical vector

Labels of data in testing datastore, returned as a numeric vector, logical vector, or categorical vector.

Version History

Introduced in R2023a

splitAnomalyData

Syntax

Description

Examples

Split Datastore of Labeled Anomaly Images

Input Arguments

ds — Data set datastore

gtLabels — Ground truth labels numeric vector | logical vector | categorical vector

anomalyLabels — Anomaly labels numeric vector | logical vector | categorical vector | string vector

Name-Value Arguments

DataAllocationRatio — Proportion of input data to include in each datastore [0.7 0.1 0.2] (default) | 1-by-3 numeric vector

KeepAnomalyLabelsInTrainingDatastore — Include anomaly images in training datastore false or 0 (default) | true or 1

KeepUnusedAnomalyLabels — Include all anomaly images in validation and testing datastores true or 1 (default) | false or 0

NormalLabelsRatio — Ratio of files with normal class labels [] (default) | 1-by-3 numeric vector

AnomalyLabelsRatio — Ratio of files with anomaly class labels [] (default) | 1-by-3 numeric vector

Verbose — Display statistics true or 1 (default) | false or 0

Output Arguments

dsTrain — Training datastore datastore

dsVal — Validation datastore datastore

dsTest — Testing datastore datastore

trainLabels — Labels of data in training datastore numeric vector | logical vector | categorical vector

valLabels — Labels of data in validation datastore numeric vector | logical vector | categorical vector

testLabels — Labels of data in testing datastore numeric vector | logical vector | categorical vector

Version History

See Also

`ds` — Data set
datastore

`gtLabels` — Ground truth labels
numeric vector | logical vector | categorical vector

`anomalyLabels` — Anomaly labels
numeric vector | logical vector | categorical vector | string vector

`DataAllocationRatio` — Proportion of input data to include in each datastore
`[0.7 0.1 0.2]` (default) | 1-by-3 numeric vector

`KeepAnomalyLabelsInTrainingDatastore` — Include anomaly images in training datastore
`false` or `0` (default) | `true` or `1`

`KeepUnusedAnomalyLabels` — Include all anomaly images in validation and testing datastores
`true` or `1` (default) | `false` or `0`

`NormalLabelsRatio` — Ratio of files with normal class labels
`[]` (default) | 1-by-3 numeric vector

`AnomalyLabelsRatio` — Ratio of files with anomaly class labels
`[]` (default) | 1-by-3 numeric vector

`Verbose` — Display statistics
`true` or `1` (default) | `false` or `0`

`dsTrain` — Training datastore
datastore

`dsVal` — Validation datastore
datastore

`dsTest` — Testing datastore
datastore

`trainLabels` — Labels of data in training datastore
numeric vector | logical vector | categorical vector

`valLabels` — Labels of data in validation datastore
numeric vector | logical vector | categorical vector

`testLabels` — Labels of data in testing datastore
numeric vector | logical vector | categorical vector