trainNerfacto

Create and train Nerfacto Neural Radiance Field (NeRF) model

Since R2026a

collapse all in page

Syntax

nerf = trainNerfacto(trainingData,cameraPoses,intrinsics,outputFolder)

nerf = trainNerfacto(pretrainedModelFolder)

nerf = trainNerfacto(___,Name=Value)

Description

Add-On Required: This feature requires the Computer Vision Toolbox Interface for Nerfstudio Library add-on.

nerf = trainNerfacto(trainingData,cameraPoses,intrinsics,outputFolder) creates and trains a Nerfacto Neural Radiance Field (NeRF) model [1] from the Nerfstudio Library [2] on a collection of images taken from different camera poses. The function also creates a model folder which contains the model weights and other parameters in the path that you specified using the outputFolder argument.

Note

This feature requires the Computer Vision Toolbox Interface for Nerfstudio Library add-on, a Deep Learning Toolbox™ license, a Parallel Computing Toolbox™ license, and a CUDA^® enabled NVIDIA^® GPU with at least 16 GB of available GPU memory.

example

nerf = trainNerfacto(pretrainedModelFolder) resumes training of a pretrained Nerfacto NeRF model.

nerf = trainNerfacto(___,Name=Value) specifies options using one or more name-value arguments in addition to any combination input of arguments from previous syntaxes. For example, MaxIterations=1000 specifies to perform a maximum of 1000 training iterations on the NeRF model.

Examples

collapse all

Train Neural Radiance Field Model and Generate Novel Views

This example uses:

Open Live Script

Training Data Requirements

To train a NeRF model [1] from the Nerfstudio Library [2] on a scene, your training set must contain:

At least 100 images of the scene from multiple overlapping views of the region of interest.
Camera poses associated with each image.
Intrinsic parameters of camera used to capture the images.

To calculate the intrinsic parameters of camera through calibration, use the Camera Calibrator app. For information on estimating camera poses and related geometric information, see the Structure from Motion from Multiple Views example.

Download Training Data

The training data is located in the sfmTrainingDataTUMRGBD folder, which is stored in the tum_rgbd_nerfacto.zip file. Download and extract the tum_rgbd_nerfacto.zip file in the current directory by running this code.

The tum_rgbd_nerfacto.zip file contains the sfmTrainingDataTUMRGBD folder. The folder contains the cameraInfo.mat file which contains the camera poses associated with each image and the intrinsic parameter of camera, and the images folder which contains indoor images from the TUM RGB-D dataset [3].

Download and extract tum_rgbd_nerfacto.zip.

if ~exist("tum_rgbd_data.zip","file")
    websave("tum_rgbd_data.zip","https://ssd.mathworks.com/supportfiles/3DReconstruction/tum_rgbd_data.zip");
    unzip(fullfile("tum_rgbd_data.zip"), pwd);
end

Set Up Training Data

Specify the path to the images folder, and then create an imageDatastore object using the path.

trainingImageFolder = fullfile("sfmTrainingDataTUMRGBD","images");
imds = imageDatastore(trainingImageFolder);
disp("Number of images: " + length(imds.Files))

Number of images: 104

Show the first training image.

figure
sampleTrainingImage = preview(imds);
imshow(sampleTrainingImage)
title("Sample Training Image")

Figure contains an axes object. The hidden axes object with title Sample Training Image contains an object of type image.

Load the camera poses corresponding to the training images, and the intrinsic parameters of the camera used to capture the images.

camInfo = load(fullfile("sfmTrainingDataTUMRGBD","cameraInfo.mat"));
trainIntrinsics = camInfo.intrinsics;
trainPoses = camInfo.cameraPoses;

Display the camera poses. Observe that there are 104 camera poses, which corresponds to the 104 training images in the image datastore imds.

disp(trainPoses)

  1×104 rigidtform3d array with properties:

    Dimensionality
    Translation
    R
    A

Display the camera intrinsics. The trainNerfacto function assumes that all images were taken by a single camera, thus sharing the same intrinsics parameters for all training images.

disp(trainIntrinsics)

  cameraIntrinsics with properties:

             FocalLength: [535.1307 532.1860]
          PrincipalPoint: [323.3722 239.7986]
               ImageSize: [480 640]
        RadialDistortion: [-0.0032 0.0192]
    TangentialDistortion: [-0.0029 -0.0023]
                    Skew: 0
                       K: [3×3 double]

Train NeRF Model

Use the trainNerfacto function to create and train a NeRF model on the training images across up to 1000 iterations. Store intermediate model checkpoints and configurations in the nerfTUMTrainingOutput folder. Training requires about 30 minutes on a Linux machine with an NVIDIA GeForce RTX 3090 GPU that has 24 GB of memory.

outputFolder = fullfile(pwd,"nerfTUMTrainingOutput");
nerf = trainNerfacto(imds,trainPoses,trainIntrinsics,outputFolder,MaxIterations = 1000);

Write Novel Views from Trained NeRF Model to Image Files

Write the views captured by the first 20 camera poses to a local folder, generatedTrainingViewsTUM, by using the writeViews function.

imdsGen = writeViews(nerf,trainPoses(1:20),"generatedTrainingViewsTUM");

Display a side-by-side comparison between the training images and the generated images of the scene at a camera pose. Each comparison includes these metrics that quantify the similarity between the training images and the generated images:

Structural Similarity Index Metrics (SSIM) — Value in the range [0,1], where a value of 1 indicates the perfect similarity.
Peak Signal-to-Noise Ratio (PSNR) — Values are in decibels, where higher values indicate better similarity.

imageIdx selects the camera pose to compare.

numImg = numel(imdsGen.Files);
imageIdx = 10;

% Load image generated by the trained NeRF
imageGen = readimage(imdsGen, imageIdx);

% Load the corresponding real image used in training
imageReal = readimage(imds,imageIdx);

% Compute image similarity metrics - SSIM and PSNR
ssimVal = ssim(rgb2gray(imageGen), rgb2gray(imageReal));
psnrVal = psnr(imageGen, imageReal);

% Visualize the images and their similarity maps
figure
montage({imageReal,imageGen}, BorderSize=[2 2])
title("[Real | Generated] Image " + imageIdx + ": SSIM="+num2str(ssimVal,2) + "; PSNR="+num2str(psnrVal,4))
truesize;

Tips to Improve Training Results

Although the generated images are photorealistic, there are minor differences in the image texture and high-frequency details between the generated images and the training images. To achieve better quality results:

Capture a larger set of high resolution images.
Ensure the image set has good lighting.
Ensure the image set has no motion blur or moving objects.
Train the nerfacto object with a higher number of maximum training iterations (at the cost of longer computation time).

References

[1] Mildenhall, Ben, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis." In Computer Vision - ECCV 2020, edited by Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm. Springer International Publishing, 2020. https://doi.org/10.1007/978-3-030-58452-8_24.

[2] Tancik, Matthew, Ethan Weber, Evonne Ng, et al. “Nerfstudio: A Modular Framework for Neural Radiance Field Development.” Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings, ACM, July 23, 2023, 1–12. https://doi.org/10.1145/3588432.3591516.

[3] Sturm, Jürgen, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. “A Benchmark for the Evaluation of RGB-D SLAM Systems.” 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, October 2012, 573–80. https://doi.org/10.1109/IROS.2012.6385773.

Input Arguments

collapse all

`trainingData` — Training image collection
`imageDatastore` object

Training image collection, specified as an imageDatastore object. The image collection must contain three channel uint8 RGB images in JPEG or PNG formats. All images must be in the same format.

Tip

To improve the quality of the trained Nerfacto NeRF model, ensure that your training image collection:

Contains at least 100 images that capture the subject from multiple overlapping views with good lighting
Contains no motion blur or moving object

`cameraPoses` — Camera poses of training images
M-by-1 vector of `rigidtform3d` objects

Camera poses of the training images, specified as an M-by-1 vector of rigidtform3d objects, where M is the number images in the training image collection. Each element of the vector specifies the camera pose of the corresponding training image.

Tip

For sequential training images, such as video frames, use the monovslam object to obtain the camera poses for each training image. To learn more about how to estimate camera poses from non-sequential training images, see Structure from Motion from Multiple Views.

`intrinsics` — Intrinsic parameters of camera
`cameraIntrinsics` object

Intrinsic parameters of the camera used to capture the training images, specified as a cameraIntrinsics object. Camera intrinsic parameters include the focal length, principal point, and skew of the camera.

Tip

Use the Camera Calibrator app to obtain the camera intrinsic parameters from your training images. For an example of how to obtain camera intrinsic parameters using the app, see Using the Single Camera Calibrator App.

`outputFolder` — Output model folder path
string scalar | character vector

Output model folder path, specified as a string scalar or character vector. The function stores the trained NeRF model and the associated configuration file in this folder. The folder that you specify must be empty and writeable. If a folder that you specify in the path does not exist, the trainNerfacto function creates that folder.

Data Types: char | string

`pretrainedModelFolder` — Pretrained model folder path
string scalar | character vector

Pretrained model folder path, specified as a string scalar or a character vector. This path must match the path in the ModelFolder property of the nerfacto object for the pretrained model.

Data Types: char | string

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: trainNerfacto(trainingData,cameraPoses intrinsics,OutputFolder,MaxIterations=1000) specifies to perform a maximum of 1000 training iterations.

`MaxIterations` — Maximum training iterations
`30000` (default) | positive integer

Maximum number of training iterations, specified as a positive integer.

Tip

This hyperparameter has the highest impact to the quality of the trained Nerfacto NeRF model. Increasing this value improves the quality of the scene reconstruction that the Nerfacto NeRF model generates, but increases the training time.

Data Types: single | double

`CheckpointFrequency` — Checkpoint saving interval
`5` (default) | positive integer

Checkpoint saving interval, specified as a positive integer. This parameter specifies the iterations interval at which checkpoints are saved during model training.

Data Types: single | double

`SaveAllCheckpoints` — Keep all saved checkpoints
`false` or `0` (default) | `true` or `1`

Keep all saved checkpoints, specified as a logical 0 (false) or 1 (true). If you specify this value as false, the function only saves the most recent checkpoint and deletes all other checkpoints once training is complete.

Data Types: logical

`NumImagesToSample` — Number of images to sample in each iteration
inf (default) | positive integer

Number of images to sample in each training iteration, specified as a positive integer. By default, pixels are randomly sampled from all training images. Reducing this value reduces memory consumption during training.

Data Types: single | double

`OptimizeCameras` — Refine camera poses during training
`true` or `1` (default) | `false` or `0`

Refine camera poses during training, specified as a logical 1 (true) or 0 (false). Specify this value as true improves the training result If the initial camera poses of the training data are not accurate. If the initial camera poses are highly accurate, setting this value to false may improve the training result.

Data Types: logical

Output Arguments

collapse all

`nerf` — Trained Nerfacto NeRF model
`nerfacto` object

Trained Nerfacto NeRF model, returned as a nerfacto object.

Tips

Training a Nerfacto NeRF model using high-resolution training images can cause out-of-memory errors. To resolve this, reduce the size of your training images using the imresize function, then obtain the camera poses and camera intrinsic parameters of the resized images.

Algorithms

By training a NeRF model on a set of sparse input images and camera poses, you can create an internal representation of a 3D scene. You can use a trained NeRF model to generate a dense, colored point cloud, and render images from novel viewpoints.

A NeRF model represents a scene as a continuous 5D vector-valued function instantiated as a neural network with weights Ω. The function accepts the position and direction of a light ray and returns the color and density of 3D points along the ray.

When generating a novel view, the NeRF model first uses the neural network to calculate the color and density of 3D points along each ray from a virtual camera. The ray is defined as r(t) = o + td, where o is the origin coordinates of the ray, and d is the unit vector in the ray direction defined by the camera angles θ and ϕ. The neural network return the color c(t) if the ray r(t) hits a particle at distance t along the ray. The model then uses volumetric rendering to project the output colors and densities into an image.

During training, a NeRF model first samples a batch of rays that consist of multiple spatial locations and viewing directions. The neural network F_Ω then predicts the color and density at 3D points along each ray, and the NeRF model uses volumetric rendering to project the output colors and densities onto an image. The model calculates the training loss by matching the image pixels of the rendered image against the captured images in the training data.

References

[1] Mildenhall, Ben, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis.” In Computer Vision – ECCV 2020, edited by Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm. Springer International Publishing, 2020. https://doi.org/10.1007/978-3-030-58452-8_24.

Version History

Introduced in R2026a

trainNerfacto

Syntax

Description

Examples

Train Neural Radiance Field Model and Generate Novel Views

Input Arguments

`trainingData` — Training image collection
`imageDatastore` object

`cameraPoses` — Camera poses of training images
M-by-1 vector of `rigidtform3d` objects

`intrinsics` — Intrinsic parameters of camera
`cameraIntrinsics` object

`outputFolder` — Output model folder path
string scalar | character vector

`pretrainedModelFolder` — Pretrained model folder path
string scalar | character vector

Name-Value Arguments

`MaxIterations` — Maximum training iterations
`30000` (default) | positive integer

`CheckpointFrequency` — Checkpoint saving interval
`5` (default) | positive integer

`SaveAllCheckpoints` — Keep all saved checkpoints
`false` or `0` (default) | `true` or `1`

`NumImagesToSample` — Number of images to sample in each iteration
inf (default) | positive integer

`OptimizeCameras` — Refine camera poses during training
`true` or `1` (default) | `false` or `0`

Output Arguments

`nerf` — Trained Nerfacto NeRF model
`nerfacto` object

Tips

Algorithms

References

Version History

See Also

Topics

External Websites

trainNerfacto

Syntax

Description

Examples

Train Neural Radiance Field Model and Generate Novel Views

Input Arguments

trainingData — Training image collection imageDatastore object

cameraPoses — Camera poses of training images M-by-1 vector of rigidtform3d objects

intrinsics — Intrinsic parameters of camera cameraIntrinsics object

outputFolder — Output model folder path string scalar | character vector

pretrainedModelFolder — Pretrained model folder path string scalar | character vector

Name-Value Arguments

MaxIterations — Maximum training iterations 30000 (default) | positive integer

CheckpointFrequency — Checkpoint saving interval 5 (default) | positive integer

SaveAllCheckpoints — Keep all saved checkpoints false or 0 (default) | true or 1

NumImagesToSample — Number of images to sample in each iteration inf (default) | positive integer

OptimizeCameras — Refine camera poses during training true or 1 (default) | false or 0

Output Arguments

nerf — Trained Nerfacto NeRF model nerfacto object

Tips

Algorithms

References

Version History

See Also

Topics

External Websites

`trainingData` — Training image collection
`imageDatastore` object

`cameraPoses` — Camera poses of training images
M-by-1 vector of `rigidtform3d` objects

`intrinsics` — Intrinsic parameters of camera
`cameraIntrinsics` object

`outputFolder` — Output model folder path
string scalar | character vector

`pretrainedModelFolder` — Pretrained model folder path
string scalar | character vector

`MaxIterations` — Maximum training iterations
`30000` (default) | positive integer

`CheckpointFrequency` — Checkpoint saving interval
`5` (default) | positive integer

`SaveAllCheckpoints` — Keep all saved checkpoints
`false` or `0` (default) | `true` or `1`

`NumImagesToSample` — Number of images to sample in each iteration
inf (default) | positive integer

`OptimizeCameras` — Refine camera poses during training
`true` or `1` (default) | `false` or `0`

`nerf` — Trained Nerfacto NeRF model
`nerfacto` object