Get Started with SOLOv2 for Instance Segmentation

Perform instance segmentation using the Computer Vision Toolbox™ Model for SOLOv2 Instance Segmentation support package. To learn more about instance segmentation, see Get Started with Instance Segmentation Using Deep Learning. Use the Computer Vision Toolbox Model for SOLOv2 Instance Segmentation support package for the tasks in these sections.

To segment object instances in an image using a pretrained SOLOv2 network, or to perform inference on a test image using a trained SOLOv2 network, see the Segment Image with Pretrained SOLOv2 Network section.
To configure and train a SOLOv2 network to perform transfer learning on your own data, see the Perform Transfer Learning with SOLOv2 section.

The Segmenting Objects by LOcations version 2 (SOLOv2) model for instance segmentation offers the advantage of lightweight, scalable, and memory-efficient architecture [1]. SOLOv2 achieved state-of-the-art performance on the COCO instance segmentation benchmark, outperforming previous models. The model can process inputs of various resolutions due to its multiscale feature pyramid network (FPN), enabling it to capture object details across an extensive range of object sizes. SOLOv2 does not require external region proposal networks, and directly estimates the object centers and associated masks through anchor point localization and mask segmentation modeling.

Install Support Package

You can install the Computer Vision Toolbox Model for SOLOv2 Instance Segmentation from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons. The support package also requires Deep Learning Toolbox™ and Computer Vision Toolbox. Processing image data on a GPU requires a supported GPU device and Parallel Computing Toolbox™.

Segment Image with Pretrained SOLOv2 Network

Use the process in this section to segment a test image using a pretrained SOLOv2 network with default settings, or to perform inference using a trained SOLOv2 network.

At inference, a fully convolutional network (FCN) backbone of the SOLOv2 network extracts a set of feature maps of various spatial resolutions, or levels, from the input image. The network feeds the extracted feature maps into parallel category and mask branches to generate the final predictions: semantic categories (classes) and instance masks. You can overlay the predicted instance segmentation masks on the image to create the visualization of each object instance, and generate corresponding class labels.

SOLOv2 architecture: the FCN serves as the backbone network to extract multi-scale features and classify each pixel in the input image into segmentation categories. To combine the semantic category and instance mask, the predicted instance segmentation masks and semantic categories are overlaid on the input image.

You can perform inference on a test image with default network options using a pretrained SOLOv2 network.

Load an image or image datastore to segment from the workspace. The SOLOv2 model supports RGB or grayscale images.
```
I = imread("kobi.png");
```
Create a solov2 object to configure a pretrained SOLOv2 network with a ResNet-50 or ResNet-18 backbone as the feature extractor. To increase inference speed at the possible cost of detecting less objects, specify the lightweight ResNet-18 backbone with a reduced number of features, "light-resnet18-coco".
```
model = solov2("light-resnet18-coco");
```
Perform instance segmentation by using the segmentObjects object function on the pretrained network, specifying that the function return the object masks, labels, and detection scores.
```
[masks,labels,scores] = segmentObjects(model,I);
```

Visualize the results by using the insertObjectMask function.

maskedImage = insertObjectMask(I,masks);
imshow(maskedImage)

Instance mask of object, generated using the SOLOv2 pretrained network, overlaid on the input image

Perform Transfer Learning with SOLOv2

To modify a network to detect additional classes, or to adjust other network parameters, you can perform transfer learning. This section shows how to prepare your training data, configure the SOLOv2 model, and train the network to perform transfer learning.

Configure Training Data

To train a SOLOv2 detector, specify your labeled ground truth training data as a datastore using the trainingData input argument of the trainSOLOV2 function. You must set up your data so that calling the read and readall functions on the datastore returns a cell array with four columns. This table describes the format of each cell in each column.

Input Data	Description
RGB or grayscale image	RGB or grayscale images that serve as network inputs, specified as H-by-W-by-3 or H-by-W numeric arrays, respectively. For example, load a sample modified RGB image from the CamVid data set [2] that contains objects of interest such as vehicles, traffic lights, and pedestrians.
Ground truth bounding boxes	Bounding boxes for objects in the RGB images, specified as an M-by-4 matrix, with rows of the form [x y w h], where M is the number of object instances in the image. For example, the `bboxes` variable shows the bounding boxes of nine objects in the sample RGB image. bboxes = 1 178 94 133 178 173 115 126 63 181 54 68 320 169 15 42 383 173 12 39 359 167 14 41 141 131 12 30 55 86 75 117 146 167 14 43
Instance labels	Label of each instance, specified as a NumObjects-by-1 vector of strings or a NumObjects-by-1 cell array of character vectors. NumObjects is the number of labeled objects in the image. For example, the `labels` variable shows the label names of the nine labeled objects in the sample RGB image. labels = 9×1 categorical array car car car person person person traffic light bus person
Instance masks	Masks for instances of objects. Mask data comes in these formats: Binary masks, specified as a logical array of size H-by-W-by-NumObjects. Each mask is the segmentation of one instance in the image. Polygon coordinates, specified as a NumObjects-by-2 cell array. Each row of the array contains the (x, y) coordinates of a polygon along the boundary of one instance in the image. The SOLOv2 network requires binary masks, not polygon coordinates. If your mask data is in polygon coordinates, use the `poly2mask` function to convert the polygon coordinates to binary masks of size h-by-w-by-numObjects. For example, if the variable `masks_polygon` contains polygon coordinates, you can use this code to convert them to binary masks. denseMasks = false([h w numObjects]); for i = 1:numObjects denseMasks(:,:,i) = poly2mask(masks_polygon{i}(:,1),masks_polygon{i}(:,2),h,w); end To display the instance mask data over a sample training image `I`, use the `insertObjectMask` function. You can specify a colormap so that each object instance appears in a different color. For example, if the variable `masks` contains the corresponding instance masks, overlay the masks over the image using the `lines` colormap function. imOverlay = insertObjectMask(im,masks,Color=lines(numObjects)); imshow(imOverlay)

The datastore must return your data as a 1-by-4 cell array with four columns of the form {RGB images Bounding boxes Labels Masks}. You can create a datastore in the required format using these steps:

Create an ImageDatastore that returns RGB or grayscale image data.
Create a boxLabelDatastore that returns bounding box data and instance labels as a two-element cell array.
Create an ImageDatastore and specify a custom read function that returns mask data as a binary matrix.
Combine the three datastores using the combine function.

For more information, see Datastores for Deep Learning (Deep Learning Toolbox).

Train the SOLOv2 Network

To configure a SOLOv2 network for training, specify the class names when you create a solov2 object. You can optionally specify additional network properties, such as the network input size to use during training and inference. For example, specify a SOLOv2 network that uses ResNet-50 as the base network to detect the classes in ClassNames during training.

ClassNames = ["person","traffic light","car","bus"];
Network = solov2("resnet50-coco",ClassNames);

Specify the network training options using the trainingOptions (Deep Learning Toolbox) function. To learn more about using trainingOptions to fine-tune network parameters for training, see Set Up Parameters and Train Convolutional Neural Network (Deep Learning Toolbox).

To train the network, pass your training data, the configured solov2 object, and the trainingOptions function output to the trainSOLOV2 function. The function returns a trained SOLOv2 network.

trainedNetwork = trainSOLOV2(trainingData,Network,options);

To perform inference on a test image I using the trained network, pass the trained network as input to the segmentObjects object function. For more details, see the Segment Image with Pretrained SOLOv2 Network section.

For a detailed example of a custom training workflow, see the Perform Instance Segmentation Using SOLOv2 example.

Evaluate Instance Segmentation Results

Evaluate the quality of the instance segmentation results using the evaluateInstanceSegmentation function. Ensure that your ground truth datastore is set up so that calling the datastore with the read function returns a cell array with at least two elements in the format {masks labels}.

To calculate the prediction metrics, specify the output of the segmentObjects function and your ground truth data as input to evaluateInstanceSegmentation function. The function calculates metrics such as the confusion matrix and average precision. The instanceSegmentationMetrics object stores the metrics.

References

[1] Wang, Xinlong, Rufeng Zhang, Tao Kong, Lei Li, and Chunhua Shen. “SOLOv2: Dynamic and Fast Instance Segmentation.” ArXiv, October 23, 2020. https://doi.org/10.48550/arXiv.2003.10152.

[2] Brostow, Gabriel J., Julien Fauqueur, and Roberto Cipolla. "Semantic Object Classes in Video: A High-Definition Ground Truth Database." Pattern Recognition Letters 30, no. 2 (January 2009): 88–97. https://doi.org/10.1016/j.patrec.2008.04.005.

Related Examples

Perform Instance Segmentation Using SOLOv2

More About

Get Started with Instance Segmentation Using Deep Learning
Get Started with Image Preprocessing and Augmentation for Deep Learning
Deep Learning in MATLAB (Deep Learning Toolbox)
Datastores for Deep Learning (Deep Learning Toolbox)
Data Sets for Deep Learning (Deep Learning Toolbox)