Main Content

augmentedImageDatastore

Transform batches to augment image data

Description

An augmented image datastore transforms batches of training, validation, test, and prediction data, with optional preprocessing such as resizing, rotation, and reflection. Resize images to make them compatible with the input size of your deep learning network. Augment training image data with randomized preprocessing operations to help prevent the network from overfitting and memorizing the exact details of the training images.

To train a network using augmented images, supply the augmentedImageDatastore to the trainnet function. For more information, see Preprocess Images for Deep Learning.

  • When you use an augmented image datastore as a source of training images, the datastore randomly perturbs the training data for each epoch, so that each epoch uses a slightly different data set. The actual number of training images at each epoch does not change. The transformed images are not stored in memory.

  • An imageInputLayer normalizes images using the mean of the augmented images, not the mean of the original data set. This mean is calculated once for the first augmented epoch. All other epochs use the same mean, so that the average image does not change during training.

  • Use an augmented image datastore for efficient preprocessing of images for deep learning, including image resizing. Do not use the ReadFcn option of ImageDatastore objects. ImageDatastore allows batch reading of JPG or PNG image files using prefetching. If you set the ReadFcn option to a custom function, then ImageDatastore does not prefetch and is usually significantly slower.

By default, an augmentedImageDatastore only resizes images to fit the output size. You can configure options for additional image transformations using an imageDataAugmenter.

Creation

Description

auimds = augmentedImageDatastore(outputSize,imds) creates an augmented image datastore for classification problems using images from image datastore imds, and sets the OutputSize property.

auimds = augmentedImageDatastore(outputSize,X,Y) creates an augmented image datastore for classification and regression problems. The array X contains the predictor variables and the array Y contains the categorical labels or numeric responses.

auimds = augmentedImageDatastore(outputSize,X) creates an augmented image datastore for predicting responses of image data in array X.

auimds = augmentedImageDatastore(outputSize,tbl) creates an augmented image datastore for classification and regression problems. The table, tbl, contains predictors and responses.

auimds = augmentedImageDatastore(outputSize,tbl,responseNames) creates an augmented image datastore for classification and regression problems. The table, tbl, contains predictors and responses. The responseNames argument specifies the response variables in tbl.

auimds = augmentedImageDatastore(___,Name,Value) creates an augmented image datastore, using name-value pairs to set the ColorPreprocessing, DataAugmentation, OutputSizeMode, and DispatchInBackground properties. You can specify multiple name-value pairs. Enclose each property name in quotes.

For example, augmentedImageDatastore([28,28],myTable,'OutputSizeMode','centercrop') creates an augmented image datastore that crops images from the center.

example

Input Arguments

expand all

Image datastore, specified as an ImageDatastore object.

Images, specified as a 4-D numeric array. The first three dimensions are the height, width, and channels, and the last dimension indexes the individual images.

Data Types: single | double | uint8 | int8 | uint16 | int16 | uint32 | int32

Responses for classification or regression, specified as one of the following:

  • For a classification problem, Y is a categorical vector containing the image labels.

  • For a regression problem, Y can be an:

    • n-by-r numeric matrix. n is the number of observations and r is the number of responses.

    • h-by-w-by-c-by-n numeric array. h-by-w-by-c is the size of a single response and n is the number of observations.

Responses must not contain NaNs.

Data Types: categorical | double

Input data, specified as a table. tbl must contain the predictors in the first column as either absolute or relative image paths or images. The type and location of the responses depend on the problem:

  • For a classification problem, the response must be a categorical variable containing labels for the images. If the name of the response variable is not specified in the call to augmentedImageDatastore, the responses must be in the second column. If the responses are in a different column of tbl, then you must specify the response variable name using the responseNames argument.

  • For a regression problem, the responses must be numerical values in the column or columns after the first column. The responses can be either in multiple columns as scalars or in a single column as numeric vectors or cell arrays containing numeric 3-D arrays. When you do not specify the name of the response variable or variables, augmentedImageDatastore accepts the remaining columns of tbl as the response variables. You can specify the response variable names using the responseNames argument.

Responses must not contain NaN values. If there are NaNs in the predictor data, they are propagated through the training, however, in most cases the training fails to converge.

Data Types: table

Names of the response variables in the input table, specified as one of the following:

  • For classification or regression tasks with a single response, responseNames must be a character vector or string scalar containing the response variable in the input table.

    For regression tasks with multiple responses, responseNames must be string array or cell array of character vectors containing the response variables in the input table.

Data Types: char | cell | string

Properties

expand all

Preprocessing color operations performed on input grayscale or RGB images, specified as 'none', 'gray2rgb', or 'rgb2gray'. When the image datastore contains a mixture of grayscale and RGB images, use ColorPreprocessing to ensure that all output images have the number of channels required by imageInputLayer.

No color preprocessing operation is performed when an input image already has the required number of color channels. For example, if you specify the value 'gray2rgb' and an input image already has three channels, then no color preprocessing occurs.

Note

The augmentedImageDatastore object converts RGB images to grayscale by using the rgb2gray function. If an image has three channels that do not correspond to red, green, and blue channels (such as an image in the L*a*b* color space), then using ColorPreprocessing can give poor results.

No color preprocessing operation is performed when the input images do not have 1 or 3 channels, such as for multispectral or hyperspectral images. In this case, all input images must have the same number of channels.

Data Types: char | string

Preprocessing applied to input images, specified as an imageDataAugmenter object or 'none'. When DataAugmentation is 'none', no preprocessing is applied to input images.

Dispatch observations in the background during training, prediction, or classification, specified as false or true. To use background dispatching, you must have Parallel Computing Toolbox™.

Augmented image datastores only perform background dispatching when used with the trainnet function, and inference functions such as predict and minibatchpredict. Background dispatching does not occur when you call the read function of the datastore directly.

Number of observations that are returned in each batch. You can change the value of MiniBatchSize only after you create the datastore.

Training and prediction functions that specify a mini-batch size, such as trainingOptions, minibatchpredict, and testnet, do not set the MiniBatchSize property. For best performance, use the same mini-batch size for your datastore as for your training and prediction functions.

This property is read-only.

Total number of observations in the augmented image datastore. The number of observations is the length of one training epoch.

Size of output images, specified as a vector of two positive integers. The first element specifies the number of rows in the output images, and the second element specifies the number of columns.

Note

If you create an augmentedImageDatastore by specifying the image output size as a three-element vector, then the datastore ignores the third element. Instead, the datastore uses the value of ColorPreprocessing to determine the dimensionality of output images. For example, if you specify OutputSize as [28 28 1] but set ColorPreprocessing as 'gray2rgb', then the output images have size 28-by-28-by-3.

Method used to resize output images, specified as one of the following.

  • 'resize' — Scale the image using bilinear interpolation to fit the output size.

    Note

    augmentedImageDatastore uses the bilinear interpolation method of imresize with antialiasing. Bilinear interpolation enables fast image processing while avoiding distortions such as caused by nearest-neighbor interpolation. In contrast, by default imresize uses bicubic interpolation with antialiasing to produce a high-quality resized image at the cost of longer processing time.

  • 'centercrop' — Take a crop from the center of the training image. The crop has the same size as the output size.

  • 'randcrop' — Take a random crop from the training image. The random crop has the same size as the output size.

Data Types: char | string

Object Functions

combineCombine data from multiple datastores
hasdataDetermine if data is available to read
numpartitionsNumber of datastore partitions
partitionPartition a datastore
partitionByIndexPartition augmentedImageDatastore according to indices
previewPreview subset of data in datastore
readRead data from augmentedImageDatastore
readallRead all data in datastore
readByIndexRead data specified by index from augmentedImageDatastore
resetReset datastore to initial state
shuffleShuffle data in augmentedImageDatastore
subsetCreate subset of datastore or FileSet
transformTransform datastore
isPartitionableDetermine whether datastore is partitionable
isShuffleableDetermine whether datastore is shuffleable

Examples

collapse all

Train a convolutional neural network using augmented image data. Data augmentation helps prevent the network from overfitting and memorizing the exact details of the training images.

Load the sample data, which consists of synthetic images of handwritten digits. XTrain is a 28-by-28-by-1-by-5000 array, where:

  • 28 is the height and width of the images.

  • 1 is the number of channels.

  • 5000 is the number of synthetic images of handwritten digits.

labelsTrain is a categorical vector containing the labels for each observation.

load DigitsDataTrain

Set aside 1000 of the images for network validation.

idx = randperm(size(XTrain,4),1000);
XValidation = XTrain(:,:,:,idx);
XTrain(:,:,:,idx) = [];
TValidation = labelsTrain(idx);
labelsTrain(idx) = [];

Create an imageDataAugmenter object that specifies preprocessing options for image augmentation, such as resizing, rotation, translation, and reflection. Randomly translate the images up to three pixels horizontally and vertically, and rotate the images with an angle up to 20 degrees.

imageAugmenter = imageDataAugmenter( ...
    'RandRotation',[-20,20], ...
    'RandXTranslation',[-3 3], ...
    'RandYTranslation',[-3 3])
imageAugmenter = 
  imageDataAugmenter with properties:

           FillValue: 0
     RandXReflection: 0
     RandYReflection: 0
        RandRotation: [-20 20]
           RandScale: [1 1]
          RandXScale: [1 1]
          RandYScale: [1 1]
          RandXShear: [0 0]
          RandYShear: [0 0]
    RandXTranslation: [-3 3]
    RandYTranslation: [-3 3]

Create an augmentedImageDatastore object to use for network training and specify the image output size. During training, the datastore performs image augmentation and resizes the images. The datastore augments the images without saving any images to memory. trainnet updates the network parameters and then discards the augmented images.

imageSize = [28 28 1];
augimds = augmentedImageDatastore(imageSize,XTrain,labelsTrain,'DataAugmentation',imageAugmenter);

Specify the convolutional neural network architecture.

layers = [
    imageInputLayer(imageSize)
    
    convolution2dLayer(3,8,'Padding','same')
    batchNormalizationLayer
    reluLayer   
    
    maxPooling2dLayer(2,'Stride',2)
    
    convolution2dLayer(3,16,'Padding','same')
    batchNormalizationLayer
    reluLayer   
    
    maxPooling2dLayer(2,'Stride',2)
    
    convolution2dLayer(3,32,'Padding','same')
    batchNormalizationLayer
    reluLayer   
    
    fullyConnectedLayer(10)
    softmaxLayer];

Specify the training options. Choosing among the options requires empirical analysis. To explore different training option configurations by running experiments, you can use the Experiment Manager app.

opts = trainingOptions('sgdm', ...
    'MaxEpochs',15, ...
    'Shuffle','every-epoch', ...
    'Plots','training-progress', ...
    'Metrics','accuracy', ...
    'Verbose',false, ...
    'ValidationData',{XValidation,TValidation});

Train the neural network using the trainnet function. For classification, use cross-entropy loss. By default, the trainnet function uses a GPU if one is available. Training on a GPU requires a Parallel Computing Toolbox™ license and a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). Otherwise, the trainnet function uses the CPU. To specify the execution environment, use the ExecutionEnvironment training option.

net = trainnet(augimds,layers,"crossentropy",opts);

Tips

  • You can visualize many transformed images in the same figure by using the imtile function. For example, this code displays one mini-batch of transformed images from an augmented image datastore called auimds.

    minibatch = read(auimds);
    imshow(imtile(minibatch.input))
  • By default, resizing is the only image preprocessing operation performed on images. Enable additional preprocessing operations by using the DataAugmentation name-value pair argument with an imageDataAugmenter object. Each time images are read from the augmented image datastore, a different random combination of preprocessing operations are applied to each image.

Version History

Introduced in R2018a