Main Content

Stereo Visual Simultaneous Localization and Mapping

This example shows how to process image data from a stereo camera to build a map of an outdoor environment and estimate the trajectory of the camera. It then shows how to modify the code to support code generation using MATLAB® Coder™. The example uses a version of the ORB-SLAM2 [1] algorithm, which is feature-based and supports stereo cameras.

Visual simultaneous localization and mapping (vSLAM), refers to the process of calculating the position and orientation of a camera with respect to its surroundings, while simultaneously mapping the environment. The process uses only visual inputs from the camera. Applications for vSLAM include augmented reality, robotics, and autonomous driving.

vSLAM can be performed by using just a monocular camera. However, since depth cannot be accurately calculated using a single camera, the scale of the map and the estimated trajectory is unknown and drifts over time. In addition, to bootstrap the system, multiple views are required to produce an initial map as it cannot be triangulated from the first frame. Using a stereo camera solves these problems and provides a more reliable vSLAM solution.

Code generation requires a MATLAB Coder license.

Overview of Processing Pipeline

The pipeline for stereo vSLAM is very similar to the monocular vSLAM pipeline in the Monocular Visual Simultaneous Localization and Mapping example. The major difference is that in the Map Initialization stage 3-D map points are created from a pair of stereo images of the same stereo pair instead of two images of different frames.

monocularMappingStereo_example.png

  • Map Initialization: The pipeline starts by initializing the map of 3-D points from a pair of stereo images using the disparity map. The left image is stored as the first key frame.

  • Tracking: Once a map is initialized, for each new stereo pair, the pose of the camera is estimated by matching features in the left image to features in the last key frame. The estimated camera pose is refined by tracking the local map.

  • Local Mapping: If the current left image is identified as a key frame, new 3-D map points are computed from the disparity of the stereo pair. At this stage, bundle adjustment is used to minimize reprojection errors by adjusting the camera pose and 3-D points.

  • Loop Closure: Loops are detected for each key frame by comparing it against all previous key frames using the bag-of-features approach. Once a loop closure is detected, the pose graph is optimized to refine the camera poses of all the key frames.

Download and Explore the Input Stereo Image Sequence

The data used in this example are from the UTIAS Long-Term Localization and Mapping Dataset provided by University of Toronto Institute for Aerospace Studies. You can download the data to a directory using a web browser or by running the following code:

ftpObj       = ftp('asrl3.utias.utoronto.ca');
tempFolder   = fullfile(tempdir);
dataFolder   = [tempFolder, '2020-vtr-dataset/UTIAS-In-The-Dark/'];
zipFileName  = [dataFolder, 'run_000005.zip'];
folderExists = exist(dataFolder, 'dir');

% Create a folder in a temporary directory to save the downloaded file
if ~folderExists  
    mkdir(dataFolder); 
    disp('Downloading run_000005.zip (818 MB). This download can take a few minutes.') 
    mget(ftpObj,'/2020-vtr-dataset/UTIAS-In-The-Dark/run_000005.zip', tempFolder);

    % Extract contents of the downloaded file
    disp('Extracting run_000005.zip (818 MB) ...') 
    unzip(zipFileName, dataFolder); 
end

Use two imageDatastore objects to store the stereo images.

imgFolderLeft  = [dataFolder,'/images/left/'];
imgFolderRight = [dataFolder,'/images/right/'];
imdsLeft       = imageDatastore(imgFolderLeft);
imdsRight      = imageDatastore(imgFolderRight);

% Inspect the first pair of images
currFrameIdx   = 1;
currILeft      = readimage(imdsLeft, currFrameIdx);
currIRight     = readimage(imdsRight, currFrameIdx);
imshowpair(currILeft, currIRight, 'montage');

Figure contains an axes object. The hidden axes object contains an object of type image.

Map Initialization

The ORB-SLAM pipeline starts by initializing the map that holds 3-D world points. This step is crucial and has a significant impact on the accuracy of final SLAM result. Initial ORB feature point correspondences are found using matchFeatures between two images of a stereo pair. The matched pairs should satisfy the following constraints:

  • The horizontal shift between the two corresponding feature points in the rectified stereo pair image is less than the maximum disparity. You can determine the approximate maximum disparity value from the stereo anaglyph of the stereo pair image. For more information, see Choosing Range of Disparity

  • The vertical shift between the two corresponding feature points in the rectified stereo pair image is less than a threshold.

  • The scales of the matched features are nearly identical.

The 3-D world locations corresponding to the matched feature points are determined as follows:

  • Use Choosing Range of Disparity to compute the disparity map for each pair of stereo images by using semi-global matching (SGM) method.

  • Use reconstructScene to compute the 3-D world point coordinates from the disparity map.

  • Find the locations in the disparity map that correspond to the feature points and their 3-D world locations.

% Set random seed for reproducibility
rng(0);

% Load the initial camera pose. The initial camera pose is derived based 
% on the transformation between the camera and the vehicle:
% http://asrl.utias.utoronto.ca/datasets/2020-vtr-dataset/text_files/transform_camera_vehicle.t
initialPoseData = load("initialPose.mat");
initialPose     = initialPoseData.initialPose;

% Construct the reprojection matrix for 3-D reconstruction.
% The intrinsics for the dataset can be found at the following page:
% http://asrl.utias.utoronto.ca/datasets/2020-vtr-dataset/text_files/camera_parameters.txt
focalLength     = [387.777 387.777];     % specified in pixels
principalPoint  = [257.446 197.718];     % specified in pixels [x, y]
baseline        = 0.239965;              % specified in meters
imageSize       = size(currILeft,[1,2]); % in pixels [mrows, ncols]
intrinsics      = cameraIntrinsics(focalLength, principalPoint, imageSize);

reprojectionMatrix = [1, 0, 0, -principalPoint(1); 
    0, 1, 0, -principalPoint(2);
    0, 0, 0, focalLength(1);
    0, 0, 1/baseline, 0];

% In this example, the images are already undistorted and rectified. In a general workflow, 
% uncomment the following code to undistort and rectify the images.
% currILeft     = undistortImage(currILeft, intrinsics);
% currIRight    = undistortImage(currIRight, intrinsics);
% stereoParams  = stereoParameters(intrinsics, intrinsics, eye(3), [-baseline, 0 0]);
% [currILeft, currIRight] = rectifyStereoImages(currILeft, currIRight, stereoParams, OutputView="full");

% Detect and extract ORB features from the rectified stereo images
scaleFactor = 1.2;
numLevels   = 8;
numPoints = 600;
[currFeaturesLeft,  currPointsLeft]   = helperDetectAndExtractFeatures(im2gray(currILeft), scaleFactor, numLevels, numPoints); 
[currFeaturesRight, currPointsRight]  = helperDetectAndExtractFeatures(im2gray(currIRight), scaleFactor, numLevels, numPoints);

% Match feature points between the stereo images and get the 3-D world positions 
disparityRange = [0 48];  % specified in pixels
[xyzPoints, matchedPairs] = helperReconstructFromStereo(im2gray(currILeft), im2gray(currIRight), ...
    currFeaturesLeft, currFeaturesRight, currPointsLeft, currPointsRight, reprojectionMatrix, initialPose, disparityRange);

Data Management and Visualization

After the map is initialized using the first stereo pair, you can use imageviewset and worldpointset to store the first key frames and the corresponding map points:

% Create an empty imageviewset object to store key frames
vSetKeyFrames = imageviewset;

% Create an empty worldpointset object to store 3-D map points
mapPointSet   = worldpointset;

% Add the first key frame
currKeyFrameId = 1;
vSetKeyFrames = addView(vSetKeyFrames, currKeyFrameId, initialPose, Points=currPointsLeft,...
    Features=currFeaturesLeft.Features);

% Add 3-D map points
[mapPointSet, stereoMapPointsIdx] = addWorldPoints(mapPointSet, xyzPoints);

% Add observations of the map points
mapPointSet = addCorrespondences(mapPointSet, currKeyFrameId, stereoMapPointsIdx, matchedPairs(:, 1));

% Update view direction and depth
mapPointSet = updateLimitsAndDirection(mapPointSet, stereoMapPointsIdx, vSetKeyFrames.Views);

% Update representative view
mapPointSet = updateRepresentativeView(mapPointSet, stereoMapPointsIdx, vSetKeyFrames.Views);

% Visualize matched features in the first key frame
featurePlot = helperVisualizeMatchedFeaturesStereo(currILeft, currIRight, currPointsLeft, ...
    currPointsRight, matchedPairs);

% Visualize initial map points and camera trajectory
mapPlot     = helperVisualizeMotionAndStructureStereo(vSetKeyFrames, mapPointSet);

% Show legend
showLegend(mapPlot);

Initialize Place Recognition Database

Loop detection is performed using the bags-of-words approach. A visual vocabulary represented as a bagOfFeaturesDBoW object is created offline with the ORB descriptors extracted from a large set of images in the dataset by calling:

bag = bagOfFeaturesDBoW(imds);

where imds is an imageDatastore object storing the training images.

The loop closure process incrementally builds a database, represented as an dbowLoopDetector object, that stores the visual word-to-image mapping based on the bag of ORB features.

% Load the bag of features data created offline
bag         = bagOfFeaturesDBoW("bagOfFeatures.bin.gz");

% Initialize the place recognition database
loopDatabase    = dbowLoopDetector(bag);

% Add features of the first two key frames to the database
addVisualFeatures(loopDatabase, currKeyFrameId, currFeaturesLeft);

Tracking

The tracking process is performed using every pair and determines when to insert a new key frame.

% ViewId of the last key frame
lastKeyFrameId    = currKeyFrameId;

% Index of the last key frame in the input image sequence
lastKeyFrameIdx   = currFrameIdx; 

% Indices of all the key frames in the input image sequence
addedFramesIdx    = lastKeyFrameIdx;

currFrameIdx      = 2;
isLoopClosed      = false;

Each frame is processed as follows:

  1. ORB features are extracted for each new stereo pair of images and then matched (using matchFeatures), with features in the last key frame that have known corresponding 3-D map points.

  2. Estimate the camera pose with the Perspective-n-Point algorithm using estworldpose.

Given the camera pose, project the map points observed by the last key frame into the current frame and search for feature correspondences using matchFeaturesInRadius.

  1. With 3-D to 2-D correspondences in the current frame, refine the camera pose by performing a motion-only bundle adjustment using bundleAdjustmentMotion.

  2. Project the local map points into the current frame to search for more feature correspondences using matchFeaturesInRadius and refine the camera pose again using bundleAdjustmentMotion.

  3. The last step of tracking is to decide if the current frame should be a new key frame. A frame is a key frame if both of the following conditions are satisfied:

  • At least 5 frames have passed since the last key frame or the current frame tracks fewer than 80 map points.

  • The map points tracked by the current frame are fewer than 90% of points tracked by the reference key frame.

If the current frame is to become a key frame, continue to the Local Mapping process. Otherwise, start Tracking for the next frame.

% Main loop
isLastFrameKeyFrame = true;
while ~isLoopClosed && currFrameIdx <= numel(imdsLeft.Files)

    currILeft  = readimage(imdsLeft, currFrameIdx);
    currIRight = readimage(imdsRight, currFrameIdx);

    currILeftGray = im2gray(currILeft);
    currIRightGray = im2gray(currIRight);

    [currFeaturesLeft, currPointsLeft]    = helperDetectAndExtractFeatures(currILeftGray, scaleFactor, numLevels, numPoints);
    [currFeaturesRight, currPointsRight]  = helperDetectAndExtractFeatures(currIRightGray, scaleFactor, numLevels, numPoints);

    % Track the last key frame
    % trackedMapPointsIdx:  Indices of the map points observed in the current left frame 
    % trackedFeatureIdx:    Indices of the corresponding feature points in the current left frame
    [currPose, trackedMapPointsIdx, trackedFeatureIdx] = helperTrackLastKeyFrame(mapPointSet, ...
        vSetKeyFrames.Views, currFeaturesLeft, currPointsLeft, lastKeyFrameId, intrinsics, scaleFactor);
    
    if isempty(currPose) || numel(trackedMapPointsIdx) < 30
        currFrameIdx = currFrameIdx + 1;
        continue
    end
    
    % Track the local map  and check if the current frame is a key frame.
    % localKeyFrameIds:   ViewId of the connected key frames of the current frame
    numSkipFrames     = 5;
    numPointsKeyFrame = 80;
    [localKeyFrameIds, currPose, trackedMapPointsIdx, trackedFeatureIdx, isKeyFrame] = ...
        helperTrackLocalMap(mapPointSet, vSetKeyFrames, trackedMapPointsIdx, ...
        trackedFeatureIdx, currPose, currFeaturesLeft, currPointsLeft, intrinsics, scaleFactor, ...
        isLastFrameKeyFrame, lastKeyFrameIdx, currFrameIdx, numSkipFrames, numPointsKeyFrame);

    % Match feature points between the stereo images and get the 3-D world positions
    [xyzPoints, matchedPairs] = helperReconstructFromStereo(currILeftGray, currIRightGray, currFeaturesLeft, ...
        currFeaturesRight, currPointsLeft, currPointsRight, reprojectionMatrix, currPose, disparityRange);

    % Visualize matched features in the stereo image
    updatePlot(featurePlot, currILeft, currIRight, currPointsLeft, currPointsRight, trackedFeatureIdx, matchedPairs);
    
    if ~isKeyFrame && currFrameIdx < numel(imdsLeft.Files)
        currFrameIdx = currFrameIdx + 1;
        isLastFrameKeyFrame = false;
        continue
    else
        [untrackedFeatureIdx, ia] = setdiff(matchedPairs(:, 1), trackedFeatureIdx);
        xyzPoints = xyzPoints(ia, :);
        isLastFrameKeyFrame = true;
    end
    
    % Update current key frame ID
    currKeyFrameId  = currKeyFrameId + 1;

Local Mapping

Local mapping is performed for every key frame. When a new key frame is determined, add it to the key frames and update the attributes of the map points observed by the new key frame. To ensure that mapPointSet contains as few outliers as possible, a valid map point must be observed in at least 3 key frames.

New map points are created by triangulating ORB feature points in the current key frame and its connected key frames. For each unmatched feature point in the current key frame, search for a match with other unmatched points in the connected key frames using matchFeatures. The local bundle adjustment refines the pose of the current key frame, the poses of connected key frames, and all the map points observed in these key frames.

    % Add the new key frame    
    [mapPointSet, vSetKeyFrames] = helperAddNewKeyFrame(mapPointSet, vSetKeyFrames, ...
        currPose, currFeaturesLeft, currPointsLeft, trackedMapPointsIdx, trackedFeatureIdx, localKeyFrameIds);
        
    % Remove outlier map points that are observed in fewer than 3 key frames
    if currKeyFrameId == 2
        triangulatedMapPointsIdx = [];
    end
    
    mapPointSet = helperCullRecentMapPointsStereo(mapPointSet, ...
        trackedMapPointsIdx, triangulatedMapPointsIdx, stereoMapPointsIdx);
    
    % Add new map points computed from disparity 
    [mapPointSet, stereoMapPointsIdx] = addWorldPoints(mapPointSet, xyzPoints);
    mapPointSet = addCorrespondences(mapPointSet, currKeyFrameId, stereoMapPointsIdx, ...
        untrackedFeatureIdx);
    
    % Create new map points by triangulation
    minNumMatches = 20;
    minParallax   = 0.35;
    [mapPointSet, vSetKeyFrames, triangulatedMapPointsIdx, stereoMapPointsIdx] = helperCreateNewMapPointsStereo( ...
        mapPointSet, vSetKeyFrames, currKeyFrameId, intrinsics, scaleFactor, minNumMatches, minParallax, ...
        untrackedFeatureIdx, stereoMapPointsIdx);

    % Local bundle adjustment
    [refinedViews, dist] = connectedViews(vSetKeyFrames, currKeyFrameId, MaxDistance=2);
    refinedKeyFrameIds = refinedViews.ViewId;

    % Always fix the first two key frames
    fixedViewIds = refinedKeyFrameIds(dist==2);
    fixedViewIds = fixedViewIds(1:min(10, numel(fixedViewIds)));

    % Refine local key frames and map points
    [mapPointSet, vSetKeyFrames, mapPointIdx] = bundleAdjustment(...
        mapPointSet, vSetKeyFrames, [refinedKeyFrameIds; currKeyFrameId], intrinsics, ...
        FixedViewIDs=fixedViewIds, PointsUndistorted=true, AbsoluteTolerance=1e-7,...
        RelativeTolerance=1e-16, Solver='preconditioned-conjugate-gradient', MaxIteration=10);

    % Update view direction and depth
    mapPointSet = updateLimitsAndDirection(mapPointSet, mapPointIdx, ...
        vSetKeyFrames.Views);

    % Update representative view
    mapPointSet = updateRepresentativeView(mapPointSet, mapPointIdx, ...
        vSetKeyFrames.Views);

    % Visualize 3-D world points and camera trajectory
    updatePlot(mapPlot, vSetKeyFrames, mapPointSet);

Loop Closure

The loop closure step takes the current key frame processed by the local mapping process and tries to detect and close the loop. Loop candidates are identified by querying images in the database that are visually similar to the current key frame using evaluateImageRetrieval. A candidate key frame is valid if it is not connected to the last key frame and three of its neighbor key frames are loop candidates.

When a valid loop closure candidate is found, compute the relative pose between the loop closure candidate frame and the current key frame using estgeotform3d. Then add the loop connection with the relative pose and update mapPointSet and vSetKeyFrames.

    % Check loop closure after some key frames have been created    
    if currKeyFrameId > 50
        
        % Minimum number of feature matches of loop edges
        loopEdgeNumMatches = 50;
        
        % Detect possible loop closure key frame candidates
        [isDetected, validLoopCandidates] = helperCheckLoopClosure(vSetKeyFrames, currKeyFrameId, ...
            loopDatabase, currFeaturesLeft, loopEdgeNumMatches);
        
        isTooCloseView = currKeyFrameId - max(validLoopCandidates) < 100;
        if isDetected && ~isTooCloseView
            % Add loop closure connections
            [isLoopClosed, mapPointSet, vSetKeyFrames] = helperAddLoopConnectionsStereo(...
                mapPointSet, vSetKeyFrames, validLoopCandidates, currKeyFrameId, ...
                currFeaturesLeft, currPointsLeft, loopEdgeNumMatches);
        end
    end
    
    % If no loop closure is detected, add current features into the database
    if ~isLoopClosed
        addVisualFeatures(loopDatabase,  currKeyFrameId, currFeaturesLeft);
    end
    
    % Update IDs and indices
    lastKeyFrameId  = currKeyFrameId;
    lastKeyFrameIdx = currFrameIdx;
    addedFramesIdx  = [addedFramesIdx; currFrameIdx]; 
    currFrameIdx    = currFrameIdx + 1;
end % End of main loop

Figure contains an axes object. The hidden axes object with title Matched Features in Current Frame contains 4 objects of type image, line. One or more of the lines displays its values using only markers

Figure Point Cloud Player contains an axes object. The axes object with xlabel X, ylabel Y contains 12 objects of type line, text, patch, scatter. These objects represent Map points, Estimated trajectory.

Finally, apply pose graph optimization over the essential graph in vSetKeyFrames to correct the drift. The essential graph is created internally by removing connections with fewer than minNumMatches matches in the covisibility graph. After pose graph optimization, update the 3-D locations of the map points using the optimized poses.

% Optimize the poses
vSetKeyFramesOptim = optimizePoses(vSetKeyFrames, minNumMatches, Tolerance=1e-16);

% Update map points after optimizing the poses
mapPointSet = helperUpdateGlobalMap(mapPointSet, vSetKeyFrames, vSetKeyFramesOptim);

updatePlot(mapPlot, vSetKeyFrames, mapPointSet);

% Plot the optimized camera trajectory
optimizedPoses  = poses(vSetKeyFramesOptim);
plotOptimizedTrajectory(mapPlot, optimizedPoses)

% Update legend
showLegend(mapPlot);

Compare with the Ground Truth

You can compare the optimized camera trajectory with the ground truth to evaluate the accuracy of the solution. The downloaded data contains a gps.txt file that stores the GPS location for each frame. You can convert the GPS location from geographic to local Cartesian coordinates using the helperReadGroundTruth function:

gpsLocations = helperReadGroundTruth(dataFolder)

In this example, you can also load the converted GPS data from an M-file.

% Load GPS data
gpsData     = load("gpsLocation.mat");
gpsLocation = gpsData.gpsLocation;

% Transform GPS locations to the reference coordinate system
gTruth = helperTransformGPSLocations(gpsLocation, optimizedPoses);

% Plot the GPS locations
plotActualTrajectory(mapPlot, gTruth(addedFramesIdx, :));

% Show legend
showLegend(mapPlot);

Figure Point Cloud Player contains an axes object. The axes object with xlabel X, ylabel Y contains 4 objects of type scatter, line. These objects represent Map points, Estimated trajectory, Optimized trajectory, GPS trajectory.

Dense Reconstruction from Stereo Images

Given the refined camera poses, you can perform dense reconstruction from the stereo images corresponding to the key frames.

% Create an array of pointCloud objects to store the 3-D world points
% associated with the key frames
ptClouds =  repmat(pointCloud(zeros(1, 3)), numel(addedFramesIdx), 1);

for i = 1: numel(addedFramesIdx)
    ILeft  = readimage(imdsLeft, addedFramesIdx(i));
    IRight = readimage(imdsRight, addedFramesIdx(i));

    % Reconstruct scene from disparity
    disparityMap = disparitySGM(im2gray(ILeft), im2gray(IRight), DisparityRange=disparityRange);
    xyzPoints    = reconstructScene(disparityMap, reprojectionMatrix);

    % Ignore the upper half of the images which mainly show the sky
    xyzPoints(1:floor(imageSize(1)/2), :, :) = NaN;

    % Ignore the lower part of the images which show the vehicle
    xyzPoints(imageSize(1)-50:end, :, :) = NaN;

    xyzPoints  = reshape(xyzPoints, [imageSize(1)*imageSize(2), 3]);

    % Get color from the color image
    colors = reshape(ILeft, [imageSize(1)*imageSize(2), 3]);
    
    % Remove world points that are too far away from the camera
    validIndex = xyzPoints(:, 3) > 0 & xyzPoints(:, 3) < 100/reprojectionMatrix(4, 3);  
    xyzPoints  = xyzPoints(validIndex, :);
    colors     = colors(validIndex, :);
    
    % Transform world points to the world coordinates
    currPose   = optimizedPoses.AbsolutePose(i);
    xyzPoints  = transformPointsForward(currPose, xyzPoints);
    ptCloud    = pointCloud(xyzPoints, Color=colors);

    % Downsample the point cloud
    ptClouds(i) = pcdownsample(ptCloud, random=0.5); 
end

% Concatenate the point clouds
pointCloudsAll = pccat(ptClouds);

% Visualize the point cloud
figure
ax = pcshow(pointCloudsAll,VerticalAxis="y", VerticalAxisDir="down");
xlabel('X')
ylabel('Y')
zlabel('Z')
title('Dense Reconstruction')

% Display the bird's eye view of the scene
view(ax, [0 0 1]);
camroll(ax, 90);

Figure contains an axes object. The axes object with title Dense Reconstruction, xlabel X, ylabel Y contains an object of type scatter.

Code Generation

Because code generation does not support ImageDatastore objects, read the input images, convert them to grayscale, and store them in a cell array.

for i=1:numel(imdsLeft.Files)
    imagesLeft{i} = im2gray(readimage(imdsLeft,i));
end

for i=1:numel(imdsRight.Files)
    imagesRight{i} = im2gray(readimage(imdsRight,i));
end

Load the initial camera pose into the workspace. The initial camera pose has been derived based on the transformation between the camera and the vehicle. Get the forward 3-D rigid transformation of the camera from the initial pose data.

initialPoseData = load("initialPose.mat");
initialPose = initialPoseData.initialPose;
% Get the forward 3-D rigid transformation of the camera
forwardTransformation = initialPose.A;

Use the helperStereoVisualSLAMCodegen function to generate C++ code for deployment on a host computer. To meet the requirements of MATLAB Coder, you must restructure the code to isolate the algorithm from the visualization code. The helperStereoVisualSLAMCodegen function encapsulates the algorithmic process of map initialization, tracking, local mapping, and loop closure. The function takes two cell arrays of images, imagesLeft and imagesRight, and the initial camera pose as input arguments, and outputs 3-D worldpointset, estimated camera poses, and indices of all the key frames.

Use the Compilation Directive %#codegen (MATLAB Coder) command to compile the helperStereoVisualSLAMCodegen function into a MEX file. Note that, by default, the generated MEX file has the same name as the original MATLAB function with "_mex" appended as a suffix: helperStereoVisualSLAMCodegen_mex. To save the MEX file with a custom filename, use the -o option. You can specify the -report option to generate a compilation report that shows the original MATLAB code and the associated files created during code generation. You can also create a temporary directory where MATLAB Coder can store the generated files. For more options related to MEX file generation, see options (MATLAB Coder) on the codegen page.

cpuConfig = coder.config("mex");
cpuConfig.TargetLang = "C++";
% Generate C++ code
codegen -config cpuConfig helperStereoVisualSLAMCodegen -args {imagesLeft,imagesRight,forwardTransformation}
Code generation successful.

Use the helperStereoVisualSLAMCodegen_mex function to obtain the estimated and optimized camera poses based on imagesLeft, imagesRight, and forwardTransformation.

% Set the random number generation seed to default, for reproducibility
rng("default");
% Run the generated MEX file
stereoSlamOut = helperStereoVisualSLAMCodegen_mex(imagesLeft,imagesRight,forwardTransformation);

Plot the estimated trajectory and actual trajectory of the camera by specifying stereoSlamOut as an input argument to the helperVisualizeStereoVisualSlam helper function.

% Visualize the results
helperVisualizeStereoVisualSlam(stereoSlamOut);

Figure Point Cloud Player contains an axes object. The axes object with xlabel X, ylabel Y contains 4 objects of type scatter, line. These objects represent Map points, Estimated trajectory, Optimized trajectory, GPS trajectory.

Supporting Functions

This example includes these helper functions as separate files:

helperDetectAndExtractFeaturesDetects and extracts ORB features from the image.

helperReconstructFromStereoReconstructs the scene from a stereo image using the disparity map.

helperFindValidFeaturePairsMatches features between a pair of stereo images.

helperCullRecentMapPointsStereoCulls recently added map points.

helperAddLoopConnectionsStereoAdds connections between the current key frame and the valid loop candidate key frames.

helperAddNewKeyFrameAdds key frames to the key frame set.

helperCheckLoopClosureDetects loop candidate key frames by retrieving visually similar images from the feature database.

helperCreateNewMapPointsStereoCreates new map points by triangulating matched feature points between the current key frame and the connected key frames.

helperStereoVisualSLAMCodegenContains the algorithm of stereo SLAM for a code generation workflow.

helperVisualizeStereoVisualSlamVisualizes the output of a code generation workflow.

helperVisualizeMotionAndStructureStereoShows map points and camera trajectory.

helperVisualizeMatchedFeaturesStereoShows the matched features in a frame.

helperTrackLocalMapRefines the camera pose by tracking the local map.

helperTrackLastKeyFrameEstimates the camera pose by tracking the last key frame.

The example also uses these helper functions:

helperUpdateGlobalMapUpdates the 3-D locations of map points after pose graph optimization.

function mapPointSet = helperUpdateGlobalMap(mapPointSet,vSetKeyFrames,vSetKeyFramesOptim)

posesOld = vSetKeyFrames.Views.AbsolutePose;
posesNew = vSetKeyFramesOptim.Views.AbsolutePose;
positionsOld = mapPointSet.WorldPoints;
positionsNew = positionsOld;
indices = 1:mapPointSet.Count;

% Update world location of each map point based on the new absolute pose of 
% the corresponding major view
for i = 1:mapPointSet.Count
    majorViewIds = mapPointSet.RepresentativeViewId(i);
    tform = rigidtform3d(posesNew(majorViewIds).A/posesOld(majorViewIds).A);
    positionsNew(i,:) = transformPointsForward(tform,positionsOld(i,:));
end
mapPointSet = updateWorldPoints(mapPointSet,indices,positionsNew);
end

helperReadGroundTruthReads ground truth data and converts it to Cartesian coordinates.

function gTruth = helperReadGroundTruth(dataFolder) %#ok<DEFNU>
opts = delimitedTextImportOptions("NumVariables",5);

% Specify range and delimiter
opts.DataLines = [1 Inf];
opts.Delimiter = ",";

% Specify column names and types
opts.VariableNames = ["Var1","Var2","VarName3","VarName4","VarName5"];
opts.SelectedVariableNames = ["VarName3","VarName4","VarName5"];
opts.VariableTypes = ["string","string","double","double","double"];

% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";

% Specify variable properties
opts = setvaropts(opts, ["Var1" "Var2"],"WhitespaceRule","preserve");
opts = setvaropts(opts, ["Var1" "Var2"],"EmptyFieldRule","auto");

% Import the data
gps = readtable([dataFolder "/gps.txt"],opts);
gps = table2array(gps);

index = helperImportTimestampFile(dataFolder);

origin = [gps(1,1) gps(1,2) gps(1,3)];
[x,y,z] = latlon2local(gps(index,1),gps(index,2),gps(index,3),origin);
gTruth = [x y z];
end

helperImportTimestampFileImports the timestamps of GPS data and images.

function index = helperImportTimestampFile(dataFolder)
% Read GPS and image timestamp files
% Set up the import Options and import the data
opts = delimitedTextImportOptions("NumVariables",2);

% Specify range and delimiter
opts.DataLines = [1 Inf];
opts.Delimiter = ",";

% Specify column names and types
opts.VariableNames = ["VarName1","VarName2"];
opts.VariableTypes = ["double","double"];

% Specify file-level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";

% Import the GPS and image data
timestampsgps = readtable([dataFolder "/timestamps_gps.txt"], opts);
timestampsimg = readtable([dataFolder "timestamps_images.txt"], opts);

% Convert to output type
timestampsgps = table2array(timestampsgps);
timestampsimg = table2array(timestampsimg);

% Synchronize the timestamps of the GPS data and images
index = interp1(timestampsgps(:,2),1:size(timestampsgps,1),timestampsimg(:,2),"nearest","extrap");
end

helperTransformGPSLocationsTransforms GPS locations to the reference coordinate system.

function gTruth = helperTransformGPSLocations(gpsLocations,optimizedPoses)

initialYawGPS  = atan((gpsLocations(100,2) - gpsLocations(1,2))/ ...
    (gpsLocations(100,1) - gpsLocations(1,1)));
initialYawSLAM = atan((optimizedPoses.AbsolutePose(50).Translation(2) - ...
    optimizedPoses.AbsolutePose(1).Translation(2))/ ...
    (optimizedPoses.AbsolutePose(59).Translation(1) - ...
    optimizedPoses.AbsolutePose(1).Translation(1)));

relYaw = initialYawGPS - initialYawSLAM;
relTranslation = optimizedPoses.AbsolutePose(1).Translation;

initialTform = rotationVectorToMatrix([0 0 relYaw]);
for i = 1:size(gpsLocations, 1)
    gTruth(i, :) =  initialTform * gpsLocations(i, :)' + relTranslation';
end
end

References

[1] Mur-Artal, Raul, and Juan D. Tardós. “ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras.” IEEE Transactions on Robotics 33, no. 5 (October 2017): 1255–62. https://doi.org/10.1109/TRO.2017.2705103.