balancePixelLabels

Balance pixel labels by oversampling block locations in large images

Since R2020a

Syntax

blockLocations = balancePixelLabels(blockedImages,blockSize,numObservations)

blockLocations = balancePixelLabels(blockedImages,blockSize,numObservations,Name,Value)

Description

blockLocations = balancePixelLabels(blockedImages,blockSize,numObservations) creates a list of block locations in the large labeled images, blockedImages, that result in a class balanced data set by oversampling image regions that contain less-common labels. numObservations is the required number of block locations, and blockSize specifies the block size.

A balanced dataset can produce better results when used for training workflows such as semantic segmentation in deep learning.

example

blockLocations = balancePixelLabels(blockedImages,blockSize,numObservations,Name,Value) specifies additional aspects of the selected blocks using name-value arguments.

Examples

collapse all

Balance Pixel Labels in Unbalanced Data Set

Open Live Script

Specify the location of a labeled image data set.

dataDir = fullfile(toolboxdir("vision"),"visiondata");
labelDir = fullfile(dataDir,"buildingPixelLabels");
fileSet = matlab.io.datastore.FileSet(labelDir,FileExtensions=".png");

Create an array of labeled images from the data set.

blockedImages = blockedImage(fileSet);

Set the block size of the images. Assume the finest resolution level.

blockSize = [20 15];

Create a blockedImageDatastore from the image array.

blabelds = blockedImageDatastore(blockedImages,BlockSize=blockSize);

Count pixel label occurrences of each class. The classes in the pixel label images are not balanced.

pixelLabelID = [1 2 3 4];
classNames = ["sky" "grass" "building" "sidewalk"];
labelCounts = countEachLabel(blabelds, ...
    Classes=classNames,PixelLabelIDs=pixelLabelID);

Specify the number of block locations to sample from the data set.

numObservations = 2000;

Select the block locations from the labeled images to achieve class balancing.

locationSet = balancePixelLabels(blockedImages,blockSize,numObservations, ...
    Classes=classNames,PixelLabelIDs=pixelLabelID);

Create a blockedImageDatastore using the block locations after balancing.

blabeldsBalanced = blockedImageDatastore(blockedImages,BlockLocationSet=locationSet);

Recalculate the pixel label occurrences for the balanced data set.

labelCountsBalanced = countEachLabel(blabeldsBalanced, ...
    Classes=classNames,PixelLabelIDs=pixelLabelID);

Compare the original unbalanced labels and labels after label balancing.

figure
h1 = histogram("Categories",labelCounts.Name,...
    BinCounts=labelCounts.PixelCount)

h1 = 
  Histogram with properties:

              Data: [0x0 categorical]
            Values: [314849 159787 1031235 25313]
    NumDisplayBins: 4
        Categories: {'sky'  'grass'  'building'  'sidewalk'}
      DisplayOrder: 'manual'
     Normalization: 'count'
      DisplayStyle: 'bar'
         FaceColor: 'auto'
         EdgeColor: [0 0 0]

  Use GET to show all properties

title(h1.Parent,"Original Labels")

Figure contains an axes object. The axes object with title Original Labels contains an object of type categoricalhistogram.

figure
h2 = histogram("Categories",labelCountsBalanced.Name,...
    BinCounts=labelCountsBalanced.PixelCount)

h2 = 
  Histogram with properties:

              Data: [0x0 categorical]
            Values: [131906 241546 81006 143167]
    NumDisplayBins: 4
        Categories: {'sky'  'grass'  'building'  'sidewalk'}
      DisplayOrder: 'manual'
     Normalization: 'count'
      DisplayStyle: 'bar'
         FaceColor: 'auto'
         EdgeColor: [0 0 0]

  Use GET to show all properties

title(h2.Parent,"Balanced Labels")

Figure contains an axes object. The axes object with title Balanced Labels contains an object of type categoricalhistogram.

Input Arguments

collapse all

`blockedImages` — Labeled blocked images
`blockedImage` object | vector of `blockedImage` objects

Labeled blocked images, specified as a blockedImage object or a vector of blockedImage objects containing pixel label images.

`blockSize` — Block size
two-element row vector of positive integers

Block size of read data, specified as a two-element row vector of positive integers, [numrows,numcols]. The first element specifies the number of rows in the block. The second element specifies the number of columns.

`numObservations` — Number of block locations
positive integer

Number of block locations to return, specified as a positive integer.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'Classes',classNames,'PixelLabelIDs',pixelLabelID

`Levels` — Image resolution levels
`1` (default) | numeric scalar | integer-valued vector

Image resolution levels, specified as a numeric scalar or an integer-valued vector of the same length as the vector of blockedImages. If you specify a scalar value, then all blocked images supply blocks at the specified resolution level.

`Classes` — Set of class names
`{}` (default) | `string` vector | cell array of `char` vectors

Set of class names, specified as a string vector or a cell array of character vectors.

You must specify this argument when blockedImages yields numeric data, such as when pixel label data is stored as an RGB image. Do not specify this argument when blockedImages yields categorical data.

Data Types: char | string

`PixelLabelIDs` — Numeric IDs that map labels to class names
`[]` (default) | numeric vector | M-by-3 numeric matrix

Numeric IDs that map labels to class names, specified as a vector of numeric IDs for each label or an M-by-3 matrix where M is the number of class names. The length of the vector must equal the number of class names. Each row is a three-element vector representing the RGB pixel value associated with each class name.

`UseParallel` — Use new or existing pool
`false` or `0` (default) | `true` or `1`

Use a new or existing parallel pool, specified as a numeric or logical 1 (true) or 0 (false). If no parallel pool is active, then a new pool is opened based on the default parallel settings. The DataSource property of all input blockedImage objects should be valid paths on each of the parallel workers.

This syntax requires the Parallel Computing Toolbox™.

Data Types: logical

Output Arguments

collapse all

`blockLocations` — Block locations
`blockLocationSet` object

Block locations, returned as a blockLocationSet object. The object contains numObservations number of locations of balanced blocks, each of size blockSize.

Algorithms

To balance pixel labels, the function oversamples the minority classes in the input images. The minority class is determined by calculating the overall pixel label counts for the complete dataset. The algorithm follows these steps.

The images in the input image array are divided into macro blocks, which is a multiple of the blockSize input value.
The function counts pixel labels for all classes in each macro block. Then, it selects the macro block with the greatest occurrences of minority classes using weighted random selection.
The algorithm uses a random block location within the selected macro block to perform oversampling. The origin of the block location must always be fully within the limits of the macro block.
The function updates the overall label counts based on the pixel label counts of the classes found for the selected macro block.
The function includes the new (oversampled) classes to compute new minority class.
This process repeats until the number of block locations processed equals the value specified by the numObservations input value.

Version History

Introduced in R2020a

expand all

R2021a: `bigLabeledImages` argument is not recommended

The bigLabeledImages argument, which supports bigimage objects, is not recommended. Use the blockedImages argument instead, which supports blockedImage objects. The blockedImage object offers several advantages including extension to N-D processing, a simpler interface, and custom support for reading and writing nonstandard image formats.

Although there are no plans to remove the bigLabeledImages argument at this time, switch to the blockedImages argument to take advantage of the additional capabilities and flexibility.

To update your code, follow these steps:

Replace bigimage object input with blockedImage object input for the first argument of this function.
If you want to select blocks of any of the blocked images at a resolution level other than 1, then specify the 'Levels' name-value argument. You can omit this argument when you want to select blocks from all blocked images at resolution level 1.
If the blocked images yield numeric data, such as representing pixel label data as an RGB image, then specify the 'Classes' and 'PixelLabelsIDs' name-value arguments.

The table gives an example of how to update your code.

Discouraged Usage Recommended Replacement

Discouraged Usage	Recommended Replacement
This example selects blocks at resolution level 1 from a labeled `bigimage` object. pixelLabelIDs = [1 2 3 4]; classNames = ["sky" "grass" "building" "sidewalk"]; labelDir = fullfile(toolboxdir('vision'), ... 'visiondata','buildingPixelLabels'); filename = fullfile(labelDir,'Label_1.png') bim = bigimage(imread(filename), ... 'Classes',classNames,'PixelLabelIDs',pixelLabelIDs); blockSize = [20 15]; numObservations = 2000; locationSet = balancePixelLabels(bim,1, ... blockSize,numObservations);	Here is equivalent code, replacing the input `bigimage` object with a `blockedImage` object. pixelLabelIDs = [1 2 3 4]; classNames = ["sky" "grass" "building" "sidewalk"]; labelDir = fullfile(toolboxdir('vision'), ... 'visiondata','buildingPixelLabels'); filename = fullfile(labelDir,'Label_1.png') bim = blockedImage(filename); blockSize = [20 15]; numObservations = 2000; locationSet = balancePixelLabels(bim, ... blockSize,numObservations, ... 'Classes',classNames,'PixelLabelIDs',pixelLabelIDs);

This example selects blocks at resolution level 1 from a labeled bigimage object.

pixelLabelIDs = [1 2 3 4];
classNames = ["sky" "grass" "building" "sidewalk"];
labelDir = fullfile(toolboxdir('vision'), ...
    'visiondata','buildingPixelLabels');
filename = fullfile(labelDir,'Label_1.png')
bim = bigimage(imread(filename), ...
    'Classes',classNames,'PixelLabelIDs',pixelLabelIDs);
blockSize = [20 15];
numObservations = 2000;
locationSet = balancePixelLabels(bim,1, ...
    blockSize,numObservations);

Here is equivalent code, replacing the input bigimage object with a blockedImage object.

pixelLabelIDs = [1 2 3 4];
classNames = ["sky" "grass" "building" "sidewalk"];
labelDir = fullfile(toolboxdir('vision'), ...
    'visiondata','buildingPixelLabels');
filename = fullfile(labelDir,'Label_1.png')
bim = blockedImage(filename);
blockSize = [20 15];
numObservations = 2000;
locationSet = balancePixelLabels(bim, ...
    blockSize,numObservations, ...
    'Classes',classNames,'PixelLabelIDs',pixelLabelIDs);

balancePixelLabels

Syntax

Description

Examples

Balance Pixel Labels in Unbalanced Data Set

Input Arguments

`blockedImages` — Labeled blocked images
`blockedImage` object | vector of `blockedImage` objects

`blockSize` — Block size
two-element row vector of positive integers

`numObservations` — Number of block locations
positive integer

Name-Value Arguments

`Levels` — Image resolution levels
`1` (default) | numeric scalar | integer-valued vector

`Classes` — Set of class names
`{}` (default) | `string` vector | cell array of `char` vectors

`PixelLabelIDs` — Numeric IDs that map labels to class names
`[]` (default) | numeric vector | M-by-3 numeric matrix

`UseParallel` — Use new or existing pool
`false` or `0` (default) | `true` or `1`

Output Arguments

`blockLocations` — Block locations
`blockLocationSet` object

Algorithms

Version History

R2021a: `bigLabeledImages` argument is not recommended

See Also

Objects

balancePixelLabels

Syntax

Description

Examples

Balance Pixel Labels in Unbalanced Data Set

Input Arguments

blockedImages — Labeled blocked images blockedImage object | vector of blockedImage objects

blockSize — Block size two-element row vector of positive integers

numObservations — Number of block locations positive integer

Name-Value Arguments

Levels — Image resolution levels 1 (default) | numeric scalar | integer-valued vector

Classes — Set of class names {} (default) | string vector | cell array of char vectors

PixelLabelIDs — Numeric IDs that map labels to class names [] (default) | numeric vector | M-by-3 numeric matrix

UseParallel — Use new or existing pool false or 0 (default) | true or 1

Output Arguments

blockLocations — Block locations blockLocationSet object

Algorithms

Version History

R2021a: bigLabeledImages argument is not recommended

See Also

Objects

`blockedImages` — Labeled blocked images
`blockedImage` object | vector of `blockedImage` objects

`blockSize` — Block size
two-element row vector of positive integers

`numObservations` — Number of block locations
positive integer

`Levels` — Image resolution levels
`1` (default) | numeric scalar | integer-valued vector

`Classes` — Set of class names
`{}` (default) | `string` vector | cell array of `char` vectors

`PixelLabelIDs` — Numeric IDs that map labels to class names
`[]` (default) | numeric vector | M-by-3 numeric matrix

`UseParallel` — Use new or existing pool
`false` or `0` (default) | `true` or `1`

`blockLocations` — Block locations
`blockLocationSet` object

R2021a: `bigLabeledImages` argument is not recommended