Main Content

monovslam

Visual simultaneous localization and mapping (vSLAM) with monocular camera

Since R2023b

    Description

    Use the monovslam object to perform visual simultaneous localization and mapping (vSLAM) with a monocular camera. The object extracts Oriented FAST and Rotated BRIEF (ORB) features from incrementally read images, and then tracks those features to estimate camera poses, identify key frames, and reconstruct a 3-D environment. The vSLAM algorithm also searches for loop closures using the bag-of-features algorithm, and then optimizes the camera poses using pose graph optimization. To learn more about visual SLAM, see Implement Visual SLAM in MATLAB.

    Note

    To use the monovslam object, you must have a Navigation Toolbox™ license.

    Creation

    Description

    vslam = monovslam(intrinsics) creates a monocular visual SLAM object, vslam, by using the camera intrinsic parameters.

    The monovslam object does not account for lens distortion. You can use the specified undistortImage function to undistort images before adding them to the object.

    The object represents 3-D map points and camera poses in world coordinates. The object assumes the camera pose of the first key frame is an identity rigidtform3d transform.

    Note

    The monovslam object runs on multiple threads internally, which can delay the processing of an image frame added by using the addFrame function. Additionally, the object running on multiple threads means the current frame the object is processing can be different than the recently added frame.

    example

    vslam = monovslam(intrinsics,imuParameters) creates a monocular visual SLAM object, vslam, by fusing the camera input intrinsics with inertial measurement unit (IMU) readings. Use of this syntax requires Navigation Toolbox.

    vslam = monovslam(intrinsics,Name=Value) sets properties using one or more name-value arguments. For example, MaxNumFeatures=850 sets the maximum number of ORB feature points to extract from each image to 850.

    Input Arguments

    expand all

    Camera intrinsic parameters, specified as a cameraIntrinsics object.

    IMU parameters, specified as a factorIMUParameters (Navigation Toolbox) object. The object includes information about the IMU's noise, bias and sample rate. Use of this object requires Navigation Toolbox.

    Properties

    expand all

    Feature Extraction

    This property is read-only.

    Scale factor for image decomposition, stored as a scalar greater than 1. The scale factor is also referred to as the pyramid decimation ratio. Increasing the value of ScaleFactor reduces the number of pyramid levels, but reduce computation time. Decreasing this value (down to just over 1), increases the number of pyramid levels, which can improve tracking performance, at the cost of computation speed. The scale value at each level of decomposition is ScaleFactor(level-1), where level is any value in the range [0, Numlevels-1]. Given the input image of size M-by-N, the image size at each level of composition is Mlevel-by-Nlevel, where:

    Mlevel=MScaleFactor(level1)Nlevel=NScaleFactor(level1)

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint64 | uint32

    This property is read-only.

    Number of decomposition levels, specified as a scalar greater than or equal to 1. Increase this value to extract keypoints from the image at more levels of decomposition. Along with the ScaleFactor value, NumLevels controls the number of pyramid levels on which the object evaluates feature points.

    The image size at each decomposition level limits the number of levels at which you can extract keypoints. The image size at a level of decomposition must be at least 63-by-63 for keypoint detection. The maximum level of decomposition is calculated as

    levelmax = floor(log(min(M,N))log(63)log(ScaleFactor))+1

    If either the default value or the specified value of NumLevels is greater than levelmax, the object modifies NumLevels to levelmax and returns a warning.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint64 | uint32

    This property is read-only.

    Maximum number of ORB feature points uniformly extracted from each image, specified as a positive integer. The typical value is in the range of [800, 2000], depending on the resolution of the image. When the number of extracted features is less than the value of MaxNumPoints, then the object uses all feature points.

    Tracking

    This property is read-only.

    Key frame feature point range, stored as a two-element vector of the form [lowerLimit,upperLimit]. This property specifies the minimum and maximum numbers of tracked features points a frame must contain for the object to identify it as a key frame. The TrackFeatureRange and the SkipMaxFrames properties enable you to control the frequency at which frames in the tracking process adds key frames.

    The success of tracking depends on the number of tracked points in the current frame, with one of these results:

    • Tracking is lost — The number of tracked feature points in the current frame is less than the lowerLimit set by the TrackFeatureRange argument. This indicates that the image does not contain enough features, or that the camera is moving too fast. To improve the tracking, you can increase the upperLimit value of the TrackFeatureRange property and decrease the SkipMaxFrames property to add key frames more frequently.

    • Tracking is successful — The object identifies the current frame as a key frame. The number of tracked feature points in the current frame is in the range set by TrackFeatureRange.

    • Tracking adds key frames too frequently — The number of tracked feature points in the current frame is greater than the upperLimit set by the TrackFeatureRange property. This indicates that the camera is moving very slowly, which produces an unnecessary number of key frames. To improve the tracking, you can reduce the frequency of adding key frames by increasing the value of the SkipMaxFrames property.

    For more details, see the addFrame object function.

    This property is read-only.

    Maximum number of skipped frames, stored as a positive integer. When the number of tracked features is consistently greater than the upperLimit set by the TrackFeatureRange property, use the SkipMaxFrames property to control the frequency of adding new key frames. The object identifies the current frame as a key frame when the number of skipped frames since the previous key frame added equals the value of SkipMaxFrames.

    This property is read-only.

    Minimum number of matched feature points between loop closure key frames, stored as a positive integer.

    Custom bag of features for loop detection, specified as a bagOfFeaturesDBoW object. The bagOfFeaturesDBoW enables you to create a custom bag of words (BoW) from feature descriptors, alongside options to utilize a built-in vocabulary or load a custom one from a specified file.

    Progress information display, specified as [], 1, 2, or 3. Paths to the location of log files, when they are created, is displayed on the command line.

    Verbose valueDisplay descriptionDisplay location
    [] or falseDisplay is turned off 
    1 or trueStages of vSLAM executionCommand window
    2Stages of vSLAM execution, with details on how the frame is processed, such as the artifacts used to initialize the map. Log file in a temporary folder
    3Stages of vSLAM, artifacts used to initialize the map, poses and map points before and after bundle adjustment, and loop closure optimization data.Log file in a temporary folder

    IMU Fusion

    IMU sensor transform, specified as a rigidtform3d object. The object describes the rotation and translation of the IMU sensor in the camera's 3-D coordinate system.

    Number of estimated camera poses to trigger camera-IMU alignment, specified as an integer. The process of aligning camera and IMU data is initiated after a specific number of camera poses have been estimated. This alignment serves two primary purposes: first, to estimate a scale factor that translates the up-to-scale results from a monocular camera into actual world units (meters), and second, to synchronize the IMU and camera frames, effectively eliminating the influence of gravity on accelerometer data. The timing for this alignment, determined by a threshold for the number of camera poses, is key to its success. A threshold set too low may not provide enough data for accurate calibration, while one set too high risks incorporating noise from measurement drift into the calibration. For a more in-depth understanding of this calibration technique, see the estimateGravityRotationAndPoseScale (Navigation Toolbox) function.

    Subset of pose estimates, specified as a scalar in the range of (0,1]. This value specifies a fraction of the number of recent pose estimates, calculated as round(NumPosesThreshold*AlignmentFraction), for use in the camera-IMU alignment process. It effectively filters out initial, potentially noisy pose estimates, ensuring only the most relevant data contributes to the alignment for improved accuracy.

    Object Functions

    addFrameAdd image frame to visual SLAM object
    hasNewKeyFrameCheck if new key frame added in visual SLAM object
    checkStatusCheck status of visual SLAM object
    isDoneEnd-of-processing status for visual SLAM object
    mapPointsBuild 3-D map of world points
    posesAbsolute camera poses of key frames
    plotPlot 3-D map points and estimated camera trajectory in visual SLAM
    resetReset visual SLAM object

    Examples

    collapse all

    Perform monocular visual simultaneous localization and mapping (vSLAM) using the data from the TUM RGB-D Benchmark. You can download the data to a temporary directory using a web browser or by running this code:

    baseDownloadURL = "https://cvg.cit.tum.de/rgbd/dataset/freiburg3/rgbd_dataset_freiburg3_long_office_household.tgz"; 
    dataFolder = fullfile(tempdir,"tum_rgbd_dataset",filesep); 
    options = weboptions(Timeout=Inf);
    tgzFileName = dataFolder+"fr3_office.tgz";
    folderExists = exist(dataFolder,"dir");
    
    % Create a folder in a temporary directory to save the downloaded file
    if ~folderExists  
        mkdir(dataFolder) 
        disp("Downloading fr3_office.tgz (1.38 GB). This download can take a few minutes.") 
        websave(tgzFileName,baseDownloadURL,options); 
        
        % Extract contents of the downloaded file
        disp("Extracting fr3_office.tgz (1.38 GB) ...") 
        untar(tgzFileName,dataFolder); 
    end

    Create an imageDatastore object to store all the RGB images.

    imageFolder = dataFolder+"rgbd_dataset_freiburg3_long_office_household/rgb/";
    imds = imageDatastore(imageFolder);

    Specify your camera intrinsic parameters, and use them to create a monocular visual SLAM object.

    intrinsics = cameraIntrinsics([535.4 539.2],[320.1 247.6],[480 640]);
    vslam = monovslam(intrinsics,TrackFeatureRange=[30,120]);

    Process each image frame, and visualize the camera poses and 3-D map points. Note that the monovslam object runs several algorithm parts on separate threads, which can introduce a latency in processing of an image frame added by using the addFrame function.

    for i = 1:numel(imds.Files)
        addFrame(vslam,readimage(imds,i))
    
        if hasNewKeyFrame(vslam)
            % Display 3-D map points and camera trajectory
            plot(vslam);
        end
    
        % Get current status of system
        status = checkStatus(vslam);
    end 

    Figure contains an axes object. The axes object with xlabel X, ylabel Y contains 12 objects of type line, text, patch, scatter. This object represents Camera trajectory.

    Plot intermediate results and wait until all images are processed.

    while ~isDone(vslam)
        if hasNewKeyFrame(vslam)
            plot(vslam);
        end
    end

    Figure contains an axes object. The axes object with xlabel X, ylabel Y contains 12 objects of type line, text, patch, scatter. This object represents Camera trajectory.

    After all the images are processed, you can collect the final 3-D map points and camera poses for further analysis.

    xyzPoints = mapPoints(vslam);
    [camPoses,addedFramesIdx] = poses(vslam);
    
    % Reset the system
    reset(vslam)

    Compare the estimated camera trajectory with the ground truth to evaluate the accuracy.

    % Load ground truth
    gTruthData = load("orbslamGroundTruth.mat");
    gTruth     = gTruthData.gTruth;
    
    % Evaluate tracking accuracy
    mtrics = compareTrajectories(camPoses, gTruth(addedFramesIdx), AlignmentType="similarity");
    disp(['Absolute RMSE for key frame location (m): ', num2str(mtrics.AbsoluteRMSE(2))]);
    Absolute RMSE for key frame location (m): 0.093645
    
    % Plot the absolute translation error at each key frame
    figure
    ax = plot(mtrics, "absolute-translation");
    view(ax, [2.70 -49.20]); 

    Figure contains an axes object. The axes object with title Absolute Translation Error, xlabel X, ylabel Y contains 2 objects of type patch, line. These objects represent Estimated Trajectory, Ground Truth Trajectory.

    Perform monocular visual-inertial SLAM using the data from the the Blackbird data set (NYC Subway Winter). Download the MAT file which contains the images, camera intrinsics and IMU measurements.

    uavDataTarFile =  matlab.internal.examples.downloadSupportFile('shared_nav_vision/data','BlackbirdVIOData.tar');  
    
    % Extract the file.
    outputFolder = fileparts(uavDataTarFile);
    if (~exist(fullfile(outputFolder,"BlackbirdVIOData"),"dir"))
        untar(uavDataTarFile,outputFolder);
    end
    
    uavData = load(fullfile(outputFolder,"BlackbirdVIOData","data.mat"));
    
    images     = uavData.images;
    timeStamps = uavData.timeStamps;

    Set up the IMU noise parameters, camera intinsics and the camera-IMU extrinsics transform.

    imuParams = factorIMUParameters(SampleRate=100,GyroscopeNoise=0.1, ...
                GyroscopeBiasNoise=3e-3,AccelerometerNoise=0.3, ...
                AccelerometerBiasNoise=1e-3,ReferenceFrame="ENU");
    intrinsics = uavData.intrinsics;
    camera2IMU = rigidtform3d(tform(uavData.camToIMUTransform));

    Set up the monocular visual-inertial SLAM object.

    vslam = monovslam(intrinsics,imuParams,MaxNumPoints=2000,SkipMaxFrames=2,TrackFeatureRange=[20,100], ...
            CameraToIMUTransform=camera2IMU,NumPosesThreshold=40,AlignmentFraction=0.9, ...
            CustomBagOfFeatures=bagOfFeaturesDBoW('BlackBirdBoW.bin.gz'),LoopClosureThreshold=100);

    Process all the images frames and IMU measurements available, and visualize the camera poses and 3-D map points. The sequence does not contain any movement before the 150th frame.

    startFrameIdx = 150;
    
    for i = startFrameIdx:length(images)
    
        if(i>startFrameIdx)
            imuMesurements = helperExtractIMUMeasurements(uavData, i-1, i);
        else
            imuMesurements.gyro=[];
            imuMesurements.accel=[];
        end
    
        addFrame(vslam, images{i}, imuMesurements.gyro, imuMesurements.accel);
    
        if hasNewKeyFrame(vslam)
            plot(vslam);
        end
    end

    Note that the monovslam object runs several algorithm parts on separate threads, which can introduce a latency in processing of an image frame added by using the addFrame function.

    % Plot intermediate results and wait until all images are processed
    while ~isDone(vslam)
        if hasNewKeyFrame(vslam)
            plot(vslam);
        end
    end

    Figure contains an axes object. The axes object with xlabel X, ylabel Y contains 12 objects of type line, text, patch, scatter. This object represents Camera trajectory.

    The helper function below is used to extract the IMU measurements, based on the camera timestamps.

    function imuMesurements = helperExtractIMUMeasurements(data, startFrameIdx, currFrameIdx)
    
        timeStamps = data.timeStamps;
        
        startTimeStamp = timeStamps.imageTimeStamps(startFrameIdx);
        currTimeStamp  = timeStamps.imageTimeStamps(currFrameIdx);
        
        [~,startIMUIdx] = min(abs(timeStamps.imuTimeStamps - startTimeStamp));
        [~,currIMUIdx]  = min(abs(timeStamps.imuTimeStamps - currTimeStamp));
        
        imuMesurements.accel = data.accelReadings(startIMUIdx:(currIMUIdx-1),:);
        imuMesurements.gyro  = data.gyroReadings(startIMUIdx:(currIMUIdx-1),:);
    
    end

    References

    [1] Mur-Artal, Raul, J. M. M. Montiel, and Juan D. Tardos. “ORB-SLAM: A Versatile and Accurate Monocular SLAM System.” IEEE Transactions on Robotics 31, no. 5 (October 2015): 1147–63. https://doi.org/10.1109/TRO.2015.2463671.

    [2] Galvez-López, D., and J. D. Tardos. “Bags of Binary Words for Fast Place Recognition in Image Sequences.” IEEE Transactions on Robotics, vol. 28, no. 5, Oct. 2012, pp. 1188–97. DOI.org (Crossref), https://doi.org/10.1109/TRO.2012.2197158.

    Extended Capabilities

    Version History

    Introduced in R2023b

    expand all