Convert Ground Truth Labeling Data for Object Re-Identification
This example shows how to convert a groundTruth
object to the re-identification training data format.
Overview
Re-identification (ReID) plays a vital role in visual object tracking, addressing temporary occlusion or objects leaving the camera's field of view, which complicates consistent tracking in real-world scenarios. To train a ReID network created using the reidentificationNetwork
(Computer Vision Toolbox) object, the ground truth data must be processed so that the training data only consists of the people within the ground truth bounding boxes. These cropped images must have consistent labeling for each object. In this example, convert a fully labeled ground truth video to the required ReID training format.
Load Ground Truth Labeling Data
To convert ground truth data into a format usable for training a ReID network, ensure that the groundTruth
has the required format. The ground truth for each object should have a rectangular region of interest (ROI) and a numeric attribute for the object ID. To learn how to label data for object tracking and generate the ground truth data, see the Automate Ground Truth Labeling for Object Tracking and Re-Identification (Computer Vision Toolbox) example. In this example, the ROI is labeled as Person.
Download the video containing the ground truth data, and load the groundTruth
object.
helperDownloadLabelVideo();
Downloading Pedestrian Tracking Video (90 MB)
load("groundTruth.mat","gTruth");
Convert Ground Truth for Object Re-Identification
After you fully label and export the ground truth data from a labeler, use the objectDetectorTrainingData
(Computer Vision Toolbox) function to directly create an imageDatastore
and boxLabelDatastore
(Computer Vision Toolbox).
Process the ground truth for training and store input images for the network to use. Use the helperCropImagesWithGroundtruth
helper function to crop out all the labeled test data within the video frames using the groundTruth
object. Use the function to resize the cropped images to 256-by-128 pixels and organize the labels into individual folders under the root directory trainingDataFolder
.
trainingDataFolder = fullfile("trainingData"); imageFrameWriteLoc = fullfile("videoFrames"); dataSize = [256 128]; if ~isfolder(trainingDataFolder) helperCropImagesWithGroundtruth(gTruth,trainingDataFolder,imageFrameWriteLoc,dataSize); end
Write images extracted for training to folder: videoFrames Writing 150 images extracted from PedestrianLabelingVideo.avi...Completed. Cleaning up videoFrames directory. Done.
Load the cropped and organized training images into an ImageDatastore
object. To use the all of the data in trainingDataFolder
, specify the IncludeSubfolders
name-value argument as true
. To use the corresponding folder names as the training data labels, specify the LabelSource
name-value argument as "foldernames"
.
imds = imageDatastore(trainingDataFolder,IncludeSubfolders=true,LabelSource="foldernames");
Display a set of image frames from the training data using the montage
(Image Processing Toolbox) function.
rng(0) previewImages = cell(2,4); for i = 1:4 previewIdx = randi(numel(imds.Files)); previewImages{1,i} = readimage(imds,previewIdx); previewImages{2,i} = imds.Labels(previewIdx); end montage(previewImages(1,:),Size=[1 4],ThumbnailSize=dataSize)
Display the labels for each image from left to right.
strcat("ID = ",string(previewImages(2,:)))
ans = 1×4 string
"ID = 7" "ID = 8" "ID = 1" "ID = 8"
To verify the accuracy of the labels, survey the values in the corresponding ID folder in the trainingDataFolder
.
Next Steps
After you convert ground truth labeling data to the required format described above, you can employ it for training a ReID network using the trainReidentificationNetwork
(Computer Vision Toolbox) function. To learn how to configure, train, and evaluate a ReID network, see the Reidentify People Throughout a Video Sequence Using ReID Network (Computer Vision Toolbox) example.
Supporting Functions
helperDownloadLabelVideo
Download the pedestrian labeling video.
function helperDownloadLabelVideo videoURL = "https://ssd.mathworks.com/supportfiles/vision/data/PedestrianLabelingVideo.avi"; if ~exist("PPedestrianLabelingVideo.avi","file") disp("Downloading Pedestrian Tracking Video (90 MB)") websave("PedestrianLabelingVideo.avi",videoURL); end end
helperCropImagesWithGroundtruth
Crop all source images in the ground truth data gTruth
with the bounding box labels gTruth
. Store the cropped images in organized subdirectories in dataFolder
.
function helperCropImagesWithGroundtruth(gTruth,dataFolder,imageFrameWriteLoc,dataSize) % Use objectDetectorTrainingData to convert the groundTruth data into an imageDataStore and boxLabelDatastore. if ~isfolder(imageFrameWriteLoc) mkdir(imageFrameWriteLoc) end [imds,blds] = objectDetectorTrainingData(gTruth,SamplingFactor=1,WriteLocation=imageFrameWriteLoc); combinedTrainingDs = combine(imds,blds); labelData = timetable2table(gTruth.LabelData); writeall(combinedTrainingDs,imageFrameWriteLoc,WriteFcn=@(data,info,format)helperWriteCroppedData(data,info,format,labelData,dataFolder,dataSize)) % Remove the video frame images. fprintf(1,"\nCleaning up %s directory.\n",imageFrameWriteLoc); rmdir(imageFrameWriteLoc,"s") fprintf(1,"\nDone.\n"); end
helperWriteCroppedData
Crop, resize, and store image ROIs from a combined datastore.
function helperWriteCroppedData(data,info,~,labelData,dataFolder,dataSize) num = 1; imageIdx = info.ReadInfo{1,2}.CurrentIndex; frame = num2str(imageIdx); imageLabelData = struct2table(labelData{imageIdx,2}{:}); attributeIDs = imageLabelData{:,2}; for i = 1:size(data{1,2},1) personID = string(attributeIDs(i)); personIDFolder = fullfile(dataFolder,personID); if ~isfolder(personIDFolder) mkdir(personIDFolder) end imgPath = fullfile(personIDFolder,strcat(frame,"_",num2str(num,'%02.f'),".jpg")); roi = data{1,2}(i,:); croppedImage = imcrop(data{1,1},roi); if ~isempty(croppedImage) resizedImg = imresize(croppedImage,dataSize); imwrite(resizedImg,imgPath); num = num + 1; end end end