Main Content

dlgradient

Compute gradients for custom training loops using automatic differentiation

Description

Use dlgradient to compute derivatives using automatic differentiation for custom training loops.

Tip

For most deep learning tasks, you can use a pretrained network and adapt it to your own data. For an example showing how to use transfer learning to retrain a convolutional neural network to classify a new set of images, see Train Deep Learning Network to Classify New Images. Alternatively, you can create and train networks from scratch using layerGraph objects with the trainNetwork and trainingOptions functions.

If the trainingOptions function does not provide the training options that you need for your task, then you can create a custom training loop using automatic differentiation. To learn more, see Define Deep Learning Network for Custom Training Loops.

example

[dydx1,...,dydxk] = dlgradient(y,x1,...,xk) returns the gradients of y with respect to the variables x1 through xk.

Call dlgradient from inside a function passed to dlfeval. See Compute Gradient Using Automatic Differentiation and Use Automatic Differentiation In Deep Learning Toolbox.

[dydx1,...,dydxk] = dlgradient(y,x1,...,xk,Name,Value) returns the gradients and specifies additional options using one or more name-value pairs. For example, dydx = dlgradient(y,x,'RetainData',true) causes the gradient to retain intermediate values for reuse in subsequent dlgradient calls. This syntax can save time, but uses more memory. For more information, see Tips.

Examples

collapse all

Rosenbrock's function is a standard test function for optimization. The rosenbrock.m helper function computes the function value and uses automatic differentiation to compute its gradient.

type rosenbrock.m
function [y,dydx] = rosenbrock(x)

y = 100*(x(2) - x(1).^2).^2 + (1 - x(1)).^2;
dydx = dlgradient(y,x);

end

To evaluate Rosenbrock's function and its gradient at the point [–1,2], create a dlarray of the point and then call dlfeval on the function handle @rosenbrock.

x0 = dlarray([-1,2]);
[fval,gradval] = dlfeval(@rosenbrock,x0)
fval = 
  1x1 dlarray

   104

gradval = 
  1x2 dlarray

   396   200

Alternatively, define Rosenbrock's function as a function of two inputs, x1 and x2.

type rosenbrock2.m
function [y,dydx1,dydx2] = rosenbrock2(x1,x2)

y = 100*(x2 - x1.^2).^2 + (1 - x1).^2;
[dydx1,dydx2] = dlgradient(y,x1,x2);

end

Call dlfeval to evaluate rosenbrock2 on two dlarray arguments representing the inputs –1 and 2.

x1 = dlarray(-1);
x2 = dlarray(2);
[fval,dydx1,dydx2] = dlfeval(@rosenbrock2,x1,x2)
fval = 
  1x1 dlarray

   104

dydx1 = 
  1x1 dlarray

   396

dydx2 = 
  1x1 dlarray

   200

Plot the gradient of Rosenbrock's function for several points in the unit square. First, initialize the arrays representing the evaluation points and the output of the function.

[X1 X2] = meshgrid(linspace(0,1,10));
X1 = dlarray(X1(:));
X2 = dlarray(X2(:));
Y = dlarray(zeros(size(X1)));
DYDX1 = Y;
DYDX2 = Y;

Evaluate the function in a loop. Plot the result using quiver.

for i = 1:length(X1)
    [Y(i),DYDX1(i),DYDX2(i)] = dlfeval(@rosenbrock2,X1(i),X2(i));
end
quiver(extractdata(X1),extractdata(X2),extractdata(DYDX1),extractdata(DYDX2))
xlabel('x1')
ylabel('x2')

Figure contains an axes. The axes contains an object of type quiver.

Input Arguments

collapse all

Variable to differentiate, specified as a scalar dlarray object. For differentiation, y must be a traced function of dlarray inputs (see Traced dlarray) and must consist of supported functions for dlarray (see List of Functions with dlarray Support).

Example: 100*(x(2) - x(1).^2).^2 + (1 - x(1)).^2

Example: relu(X)

Variable in the function, specified as a dlarray object, a cell array, structure, or table containing dlarray objects, or any combination of such arguments recursively. For example, an argument can be a cell array containing a cell array that contains a structure containing dlarray objects.

If you specify x1,...,xk as a table, the table must contain the following variables:

  • Layer — Layer name, specified as a string scalar.

  • Parameter — Parameter name, specified as a string scalar.

  • Value — Value of parameter, specified as a cell array containing a dlarray.

Example: dlarray([1 2;3 4])

Data Types: single | double | logical | struct | cell

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: dydx = dlgradient(y,x,'RetainData',true) causes the gradient to retain intermediate values for reuse in subsequent dlgradient calls

Flag to retain trace data during the function call, specified as false or true. When this argument is false, a dlarray discards the derivative trace immediately after computing a derivative. When this argument is true, a dlarray retains the derivative trace until the end of the dlfeval function call that evaluates the dlgradient. The true setting is useful only when the dlfeval call contains more than one dlgradient call. The true setting causes the software to use more memory, but can save time when multiple dlgradient calls use at least part of the same trace.

When 'EnableHigherDerivatives' is true, then intermediate values are retained and the 'RetainData' option has no effect.

Example: dydx = dlgradient(y,x,'RetainData',true)

Data Types: logical

Flag to enable higher-order derivatives, specified as the comma-separate pair consisting of 'EnableHigherDerivatives' and one of the following:

  • true – Enable higher-order derivatives. Trace the backward pass so that the returned gradients and can be used in further computations for subsequent calls to the dlgradient function. If 'EnableHigherDerivatives' is true, then intermediate values are retained and the 'RetainData' option has no effect.

  • false – Disable higher-order derivatives. Do not trace the backward pass. Use this option when you need to compute first-order derivatives only as this is usually quicker and requires less memory.

When using the dlgradient function inside an AcceleratedFunction object, the default value is true. Otherwise, the default value is false.

For examples showing how to train models that require calculating higher-order derivatives, see:

Data Types: logical

Output Arguments

collapse all

Gradient, returned as a dlarray object, or a cell array, structure, or table containing dlarray objects, or any combination of such arguments recursively. The size and data type of dydx1,...,dydxk are the same as those of the associated input variable x1,…,xk.

Limitations

  • The dlgraident function does not support calculating higher-order derivatives when using dlnetwork objects containing custom layers with a custom backward function.

  • The dlgraident function does not support calculating higher-order derivatives when using dlnetwork objects containing the following layers:

    • gruLayer

    • lstmLayer

    • bilstmLayer

  • The dlgradient function does not support calculating higher-order derivatives that depend on the following functions:

    • gru

    • lstm

    • embed

    • prod

    • interp1

More About

collapse all

Traced dlarray

During the computation of a function, a dlarray internally records the steps taken in a trace, enabling reverse mode automatic differentiation. The trace occurs within a dlfeval call. See Automatic Differentiation Background.

Tips

  • A dlgradient call must be inside a function. To obtain a numeric value of a gradient, you must evaluate the function using dlfeval, and the argument to the function must be a dlarray. See Use Automatic Differentiation In Deep Learning Toolbox.

  • To enable the correct evaluation of gradients, the y argument must use only supported functions for dlarray. See List of Functions with dlarray Support.

  • If you set the 'RetainData' name-value pair argument to true, the software preserves tracing for the duration of the dlfeval function call instead of erasing the trace immediately after the derivative computation. This preservation can cause a subsequent dlgradient call within the same dlfeval call to be executed faster, but uses more memory. For example, in training an adversarial network, the 'RetainData' setting is useful because the two networks share data and functions during training. See Train Generative Adversarial Network (GAN).

  • When you need to calculate first-order derivatives only, ensure that the 'EnableHigherDerivatives' option is false as this is usually quicker and requires less memory.

Introduced in R2019b