Checking Validity of Gradients or Jacobians
Optimization solvers often compute more quickly and reliably when you provide first
derivatives of objective and nonlinear constraint functions. However, you can easily
make a mistake in programming these derivatives. To guard against incorrect derivatives,
check the output of the programmed derivative against a finite-difference approximation
by using the checkGradients
function.
Check Gradient of Objective Function
Consider the objective Rastrigin's function
The following function computes Rastrigin's function and its gradient correctly.
function [f,g] = ras(x) f = 20 + x(1)^2 + x(2)^2 - 10*cos(2*pi*x(1) + 2*pi*x(2)); if nargout > 1 g(2) = 2*x(2) + 20*pi*sin(2*pi*x(1) + 2*pi*x(2)); g(1) = 2*x(1) + 20*pi*sin(2*pi*x(1) + 2*pi*x(2)); end end
Check that the ras
function correctly computes the gradient
near a random initial point. To see details of the calculation, set the
Display
name-value parameter to
"on"
.
rng default x0 = randn(1,2); valid = checkGradients(@ras,x0,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 7.32446e-08. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
The badras
function computes the gradient incorrectly.
function [f,g] = badras(x) f = 20 + x(1)^2 + x(2)^2 - 10*cos(2*pi*x(1) + 2*pi*x(2)); if nargout > 1 g(2) = 2*x(2) + 40*pi*sin(2*pi*x(1) + 2*pi*x(2)); g(1) = 2*x(1) + 40*pi*sin(2*pi*x(1) + 2*pi*x(2)); end end
checkGradients
correctly reports that the
badras
function computes gradients incorrectly.
valid = checkGradients(@badras,x0,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 0.494224. Supplied derivative element (1,1): 92.9841 Finite-difference derivative element (1,1): 47.0291 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ valid = logical 0
Use the correct gradient to find a local minimum of
ras(x)
.
rng default x0 = randn(1,2); options = optimoptions("fminunc",SpecifyObjectiveGradient=true); [x,fval,exitflag,output] = fminunc(@ras,x0,options)
Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance. x = 0.9975 0.9975 fval = 11.9949 exitflag = 1 output = struct with fields: iterations: 9 funcCount: 13 stepsize: 5.7421e-05 lssteplength: 1 firstorderopt: 1.2426e-05 algorithm: 'quasi-newton' message: 'Local minimum found.↵↵Optimization completed because the size of the gradient is less than↵the value of the optimality tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The first-order optimality measure, 2.482746e-07, is less ↵than options.OptimalityTolerance = 1.000000e-06.'
The solver takes 9 iterations and only 13 function evaluations to reach a local minimum. Without the gradient, the solver takes more function evaluations.
[x,fval,exitflag,output] = fminunc(@ras,x0) % No options
Local minimum found. ... output = struct with fields: iterations: 9 funcCount: 39 ...
This time, the solver takes 39 function evaluations to reach essentially the same solution as before.
Check Jacobian of Vector Objective Function
The lsqnonlin
, lsqcurvefit
, and
fsolve
functions use vector-valued objective functions.
(The objective function that these solvers try to minimize is the sum of squares of
the vector-valued objective.) A Jacobian is the first
derivative of a vector-valued objective function. The Jacobian J
of a function F(x)
with m
components, where
the variable x
has n
components, is an
m
-by-n
matrix. J(i,j)
is the partial derivative of Fi(x)
with
respect to xj
.
For example, the vecfun
code calculates the gradient for a
function of four parameters (a
, b
,
c
, and d
in the code) with a number of
variables that depends on the input data t
. The
t
vector corresponds to a set of times, and each time leads
to two entries in the objective function F(x,t)
.
function [F,J] = vecfun(x,t) t = t(:); % Reshape t to a column vector a = x(1); b = x(2); c = x(3); d = x(4); nt = length(t); F = [a*exp(-b*t) + c,... c*exp(-d*t)]; % numel(F) = 2*nt if nargout > 1 J = zeros(2*nt,4); J(1:nt,1) = exp(-b*t); % First nt components corresponding to a*exp(-b*t) + c J(1:nt,2) = (-t.*a.*exp(-b*t)); J(1:nt,3) = ones(nt,1); J(nt+1:2*nt,3) = exp(-d*t); % Second nt components corresponding to c*exp(-d*t) J(nt+1:2*nt,4) = (-t.*c.*exp(-d*t)); end end
Validate the gradient calculation in vecfun
.
t = [-1/2 1/2 1]; % Three times x = [1 2 3 4]; % Arbitrary x point valid = checkGradients(@(x)vecfun(x,t),x,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.51598e-08. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
Create artificial response data corresponding to the vecfun
function at a set of times. Use the Jacobian to help fit parameters of the function
to the data.
t = linspace(1,5); x0 = [1 2 3 4]; rng default x01 = rand(1,4); y = vecfun(x0,t); y = y + 0.1*randn(size(y)); % Add noise to response options = optimoptions("lsqcurvefit",SpecifyObjectiveGradient=true); [x,resnorm,~,exitflag,output] = lsqcurvefit(@vecfun,x01,t,y,[],[],options)
Local minimum possible. lsqcurvefit stopped because the final change in the sum of squares relative to its initial value is less than the value of the function tolerance. x = 0.9575 1.4570 2.9894 3.7289 resnorm = 2.2076 exitflag = 3 output = struct with fields: firstorderopt: 1.0824e-04 iterations: 11 funcCount: 12 cgiterations: 0 algorithm: 'trust-region-reflective' stepsize: 0.0111 message: 'Local minimum possible.↵↵lsqcurvefit stopped because the final change in the sum of squares relative to ↵its initial value is less than the value of the function tolerance.↵↵<stopping criteria details>↵↵Optimization stopped because the relative sum of squares (r) is changing↵by less than options.FunctionTolerance = 1.000000e-06.' bestfeasible: [] constrviolation: []
The returned solution vector, [0.9575,1.4570,2.9894,3.7289]
, is
close to the original vector [1,2,3,4]
. The solver takes just 12
function evaluations. Without the Jacobian, the solver takes more
evaluations.
[x,resnorm,~,exitflag,output] = lsqcurvefit(@vecfun,x01,t,y) % No options
Local minimum possible. ... output = struct with fields: firstorderopt: 1.0825e-04 iterations: 11 funcCount: 60 ...
This time, the solver takes 60 function evaluations to reach essentially the same solution as before.
Modify Finite-Difference Options and checkGradients
Arguments
Sometimes checkGradients
can give an incorrect result:
For a correct gradient,
checkGradients
can give a false report of an invalid check. Typically, this occurs because the function has a relatively large second derivative, so the finite-difference estimate is inaccurate. The false report can also occur because the value of theTolerance
name-value argument is too small.For an incorrect gradient,
checkGradients
can give a false report of a valid check. Typically, this false report occurs because the value of theTolerance
name-value argument is too large, or because the results contain a coincidentally matching derivative.
To check the gradient more carefully when you get a false report of an invalid
check, change the finite differencing options. For example,
checkGradients
incorrectly reports that the
ras
gradient is incorrect starting from the point
[-2,4]
.
valid = checkGradients(@ras,[-2,4],Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.88497e-06. Supplied derivative element (1,2): 6.21515 Finite-difference derivative element (1,2): 6.21516 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ valid = logical 0
Set the FiniteDifferenceType
option to
"central"
and test again.
opts = optimoptions("fmincon",FiniteDifferenceType="central"); valid = checkGradients(@ras,[-2,4],opts,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.10182e-09. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
Central finite differences are typically more accurate. In this case, the relative difference between the computed gradient and the central finite-difference estimate is about 1e-9. The default forward finite difference gives a relative difference of about 2e-6.
Check the validity of the gradient by loosening the tolerance from the default 1e-6 to 1e-5.
valid = checkGradients(@ras,[-2,4],Tolerance=1e-5,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.88497e-06. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
In this case, checkGradients
correctly reports that the
gradient is valid. The looser tolerance does not cause the badras
function to pass the check.
valid = checkGradients(@badras,[-2,4],Tolerance=1e-5,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 0.400823. Supplied derivative element (1,2): 4.4368 Finite-difference derivative element (1,2): 6.21516 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-05). ____________________________________________________________ valid = logical 0
Check Derivatives of Nonlinear Constraints
The unitdisk2
function correctly calculates a constraint
function and its gradient to keep x
inside the disk of radius
1.
function [c,ceq,gc,gceq] = unitdisk2(x) c = x(1)^2 + x(2)^2 - 1; ceq = [ ]; if nargout > 2 gc = [2*x(1);2*x(2)]; gceq = []; end end
The unitdiskb
function incorrectly calculates the gradient of
the constraint function.
function [c,ceq,gc,gceq] = unitdiskb(x) c = x(1)^2 + x(2)^2 - 1; ceq = [ ]; if nargout > 2 gc = [x(1);x(2)]; % Gradient incorrect: off by a factor of 2 gceq = []; end end
Set the IsConstraint
name-value argument to
true
and check the gradient at the point x0 =
randn(1,2)
.
rng default x0 = randn(1,2); valid = checkGradients(@unitdisk2,x0,IsConstraint=true,Display="on")
____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.53459e-08. checkGradients successfully passed. ____________________________________________________________ valid = 1×2 logical array 1 1
checkGradients
correctly reports that the gradient of
unitdisk2
is valid.
valid = checkGradients(@unitdiskb,x0,IsConstraint=true,Display="on")
____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1. Supplied derivative element (2,1): 1.8324 Finite-difference derivative element (2,1): 3.66479 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ valid = 1×2 logical array 0 1
checkGradients
correctly reports that the gradient of
unitdiskb
is not valid.
Check Gradients in Script
You can use the checkGradients
function to stop a script when
a finite-difference approximation does not match the corresponding gradient
function.
rng default x0 = randn(1,2); opts = optimoptions("fmincon",FiniteDifferenceType="central"); assert(checkGradients(@ras,x0,opts,Display="on")) assert(all(checkGradients(@unitdisk2,x0,opts,... IsConstraint=true,Display="on"))) [x,fval] = fmincon(@ras,x0,[],[],[],[],[],[],@unitdisk2)
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 7.52301e-10. checkGradients successfully passed. ____________________________________________________________ ____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.44672e-11. checkGradients successfully passed. ____________________________________________________________ Local minimum found that satisfies the constraints. Optimization completed because the objective function is non-decreasing in feasible directions, to within the value of the optimality tolerance, and constraints are satisfied to within the value of the constraint tolerance. x = 0.4987 0.4987 fval = 10.4987
Run the same script with the incorrect unitdiskb
constraint
function. Note that the script stops early.
rng default x0 = randn(1,2); opts = optimoptions("fmincon",FiniteDifferenceType="central"); assert(checkGradients(@ras,x0,opts,Display="on")) assert(all(checkGradients(@unitdiskb,x0,opts,... IsConstraint=true,Display="on"))) [x,fval] = fmincon(@ras,x0,[],[],[],[],[],[],@unitdiskb)
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 7.52301e-10. checkGradients successfully passed. ____________________________________________________________ ____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1. Supplied derivative element (2,1): 1.8324 Finite-difference derivative element (2,1): 3.66479 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ Error using assert Assertion failed.