Main Content

检查梯度或雅可比矩阵的有效性

当您提供目标和非线性约束函数的一阶导数时,优化求解器通常可以计算得更快、更可靠。然而,在编写这些衍生产品的代码时很容易犯错误。为了防止出现错误的导数,请使用 checkGradients 函数检查编程导数的输出是否与有限差分近似一致。

检查目标函数的梯度

考虑目标拉斯特里金函数

Ras(x)=20+x12+x2210cos(2πx1+2πx2).

以下函数正确计算了拉斯特里金函数及其梯度。

function [f,g] = ras(x)
f = 20 + x(1)^2 + x(2)^2 - 10*cos(2*pi*x(1) + 2*pi*x(2));
if nargout > 1
    g(2) = 2*x(2) + 20*pi*sin(2*pi*x(1) + 2*pi*x(2));
    g(1) = 2*x(1) + 20*pi*sin(2*pi*x(1) + 2*pi*x(2));
end
end

检查 ras 函数是否正确计算随机初始点附近的梯度。要查看计算的详细信息,请将 Display 名称值参数设置为 "on"

rng default
x0 = randn(1,2);
valid = checkGradients(@ras,x0,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 7.32446e-08.

checkGradients successfully passed.
____________________________________________________________


valid =

  logical

   1

badras 函数错误地计算了梯度。

function [f,g] = badras(x)
f = 20 + x(1)^2 + x(2)^2 - 10*cos(2*pi*x(1) + 2*pi*x(2));
if nargout > 1
    g(2) = 2*x(2) + 40*pi*sin(2*pi*x(1) + 2*pi*x(2));
    g(1) = 2*x(1) + 40*pi*sin(2*pi*x(1) + 2*pi*x(2));
end
end

checkGradients 正确报告 badras 函数错误地计算梯度。

valid = checkGradients(@badras,x0,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 0.494224.
  Supplied derivative element (1,1): 92.9841
  Finite-difference derivative element (1,1): 47.0291

checkGradients failed.

Supplied derivative and finite-difference approximation are
not within 'Tolerance' (1e-06).
____________________________________________________________


valid =

  logical

   0

使用正确的梯度找到 ras(x) 的局部最小值。

rng default
x0 = randn(1,2);
options = optimoptions("fminunc",SpecifyObjectiveGradient=true);
[x,fval,exitflag,output] = fminunc(@ras,x0,options)
Local minimum found.

Optimization completed because the size of the gradient is less than
the value of the optimality tolerance.

x =

    0.9975    0.9975


fval =

   11.9949


exitflag =

     1


output = 

  struct with fields:

       iterations: 9
        funcCount: 13
         stepsize: 5.7421e-05
     lssteplength: 1
    firstorderopt: 1.2426e-05
        algorithm: 'quasi-newton'
          message: 'Local minimum found.↵↵Optimization completed because the size of the gradient is less than↵the value of the optimality tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The first-order optimality measure, 2.482746e-07, is less ↵than options.OptimalityTolerance = 1.000000e-06.'

求解器只需 9 次迭代和 13 次函数计算即可达到局部最小值。如果没有梯度,求解器将需要进行更多的函数计算。

[x,fval,exitflag,output] = fminunc(@ras,x0) % No options
Local minimum found.
...
output = 

  struct with fields:

       iterations: 9
        funcCount: 39
...

这一次,求解器需要进行 39 次函数计算才能得出与之前基本相同的解。

检查向量目标函数的雅可比矩阵

lsqnonlinlsqcurvefitfsolve 函数使用向量值目标函数。(这些求解器试图最小化的目标函数是向量值目标的平方和。)雅可比矩阵是向量值目标函数的一阶导数。函数 F(x)m 分量,其中变量 xn 个分量,其雅可比矩阵 J 是一个 m×n 矩阵。J(i,j)Fi(x) 关于 xj 的偏导数。

例如,vecfun 代码计算四个参数(代码中的 abcd)函数的梯度,其中变量数量取决于输入数据 tt 向量对应一组时间,每个时间导致目标函数 F(x,t) 中的两个条目。

function [F,J] = vecfun(x,t)
t = t(:); % Reshape t to a column vector
a = x(1);
b = x(2);
c = x(3);
d = x(4);
nt = length(t);
F = [a*exp(-b*t) + c,...
    c*exp(-d*t)]; % numel(F) = 2*nt
if nargout > 1
    J = zeros(2*nt,4);
    J(1:nt,1) = exp(-b*t); % First nt components corresponding to a*exp(-b*t) + c
    J(1:nt,2) = (-t.*a.*exp(-b*t));
    J(1:nt,3) = ones(nt,1);
    J(nt+1:2*nt,3) = exp(-d*t); % Second nt components corresponding to c*exp(-d*t)
    J(nt+1:2*nt,4) = (-t.*c.*exp(-d*t));
end
end

验证 vecfun 中的梯度计算。

t = [-1/2 1/2 1]; % Three times
x = [1 2 3 4]; % Arbitrary x point
valid = checkGradients(@(x)vecfun(x,t),x,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.51598e-08.

checkGradients successfully passed.
____________________________________________________________


valid =

  logical

   1

在一组时间创建与 vecfun 函数相对应的人工响应数据。使用雅可比矩阵帮助将函数与数据拟合。

t = linspace(1,5);
x0 = [1 2 3 4];
rng default
x01 = rand(1,4);
y = vecfun(x0,t);
y = y + 0.1*randn(size(y)); % Add noise to response
options = optimoptions("lsqcurvefit",SpecifyObjectiveGradient=true);
[x,resnorm,~,exitflag,output] = lsqcurvefit(@vecfun,x01,t,y,[],[],options)
Local minimum possible.

lsqcurvefit stopped because the final change in the sum of squares relative to 
its initial value is less than the value of the function tolerance.

x =

    0.9575    1.4570    2.9894    3.7289


resnorm =

    2.2076


exitflag =

     3


output = 

  struct with fields:

      firstorderopt: 1.0824e-04
         iterations: 11
          funcCount: 12
       cgiterations: 0
          algorithm: 'trust-region-reflective'
           stepsize: 0.0111
            message: 'Local minimum possible.↵↵lsqcurvefit stopped because the final change in the sum of squares relative to ↵its initial value is less than the value of the function tolerance.↵↵<stopping criteria details>↵↵Optimization stopped because the relative sum of squares (r) is changing↵by less than options.FunctionTolerance = 1.000000e-06.'
       bestfeasible: []
    constrviolation: []

返回的解向量 [0.9575,1.4570,2.9894,3.7289] 接近原始向量 [1,2,3,4]。该求解器仅需 12 次函数计算。如果没有雅可比矩阵,求解器将进行更多次求值。

[x,resnorm,~,exitflag,output] = lsqcurvefit(@vecfun,x01,t,y) % No options
Local minimum possible.
...
output = 

  struct with fields:

      firstorderopt: 1.0825e-04
         iterations: 11
          funcCount: 60
...

这一次,求解器需要进行 60 次函数计算才能得出与之前基本相同的解。

修改有限差分选项和 checkGradients 参数

有时 checkGradients 可能会给出不正确的结果:

  • 对于正确的梯度,checkGradients 可以给出无效检查的错误报告。通常,出现这种情况是因为该函数具有相对较大的二阶导数,因此有限差分估计不准确。由于 Tolerance name-value 参量的值太小,也会出现误报。

  • 对于不正确的梯度,checkGradients 可能会给出有效检查的错误报告。通常,这种错误报告的出现是因为 Tolerance 名称值参量的值太大,或者因为结果包含巧合匹配的导数。

当您收到无效检查的错误报告时,为了更仔细地检查梯度,请更改有限差分选项。例如,checkGradients 错误地报告从点 [-2,4] 开始的 ras 梯度不正确。

valid = checkGradients(@ras,[-2,4],Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.88497e-06.
  Supplied derivative element (1,2): 6.21515
  Finite-difference derivative element (1,2): 6.21516

checkGradients failed.

Supplied derivative and finite-difference approximation are
not within 'Tolerance' (1e-06).
____________________________________________________________


valid =

  logical

   0

FiniteDifferenceType 选项设置为 "central" 并再次测试。

opts = optimoptions("fmincon",FiniteDifferenceType="central");
valid = checkGradients(@ras,[-2,4],opts,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.10182e-09.

checkGradients successfully passed.
____________________________________________________________


valid =

  logical

   1

中心有限差分通常更准确。在这种情况下,计算的梯度和中心有限差分估计之间的相对差异约为 1e-9。默认的正向有限差分给出约 2e-6 的相对差异。

通过将容差从默认的 1e-6 放宽到 1e-5 来检查梯度的有效性。

valid = checkGradients(@ras,[-2,4],Tolerance=1e-5,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.88497e-06.

checkGradients successfully passed.
____________________________________________________________


valid =

  logical

   1

在这种情况下,checkGradients 正确报告梯度是有效的。较宽松的容差不会导致 badras 函数通过检查。

valid = checkGradients(@badras,[-2,4],Tolerance=1e-5,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 0.400823.
  Supplied derivative element (1,2): 4.4368
  Finite-difference derivative element (1,2): 6.21516

checkGradients failed.

Supplied derivative and finite-difference approximation are
not within 'Tolerance' (1e-05).
____________________________________________________________


valid =

  logical

   0

检查非线性约束的导数

unitdisk2 函数正确计算了约束函数及其梯度,以使 x 保持在半径为 1 的圆盘内。

function [c,ceq,gc,gceq] = unitdisk2(x)
c = x(1)^2 + x(2)^2 - 1;
ceq = [ ];

if nargout > 2
    gc = [2*x(1);2*x(2)];
    gceq = [];
end
end

unitdiskb 函数错误地计算了约束函数的梯度。

function [c,ceq,gc,gceq] = unitdiskb(x)
c = x(1)^2 + x(2)^2 - 1;
ceq = [ ];

if nargout > 2
    gc = [x(1);x(2)]; % Gradient incorrect: off by a factor of 2
    gceq = [];
end
end

IsConstraint 名称值参量设置为 true 并检查点 x0 = randn(1,2) 处的梯度。

rng default
x0 = randn(1,2);
valid = checkGradients(@unitdisk2,x0,IsConstraint=true,Display="on")
____________________________________________________________

Nonlinear inequality constraint derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.53459e-08.

checkGradients successfully passed.
____________________________________________________________


valid =

  1×2 logical array

   1   1

checkGradients 正确报告 unitdisk2 的梯度是有效的。

valid = checkGradients(@unitdiskb,x0,IsConstraint=true,Display="on")
____________________________________________________________

Nonlinear inequality constraint derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.
  Supplied derivative element (2,1): 1.8324
  Finite-difference derivative element (2,1): 3.66479

checkGradients failed.

Supplied derivative and finite-difference approximation are
not within 'Tolerance' (1e-06).
____________________________________________________________


valid =

  1×2 logical array

   0   1

checkGradients 正确报告 unitdiskb 的梯度无效。

检查脚本中的渐变

当有限差分近似与相应的梯度函数不匹配时,您可以使用 checkGradients 函数来停止脚本。

rng default
x0 = randn(1,2);
opts = optimoptions("fmincon",FiniteDifferenceType="central");
assert(checkGradients(@ras,x0,opts,Display="on"))
assert(all(checkGradients(@unitdisk2,x0,opts,...
    IsConstraint=true,Display="on")))
[x,fval] = fmincon(@ras,x0,[],[],[],[],[],[],@unitdisk2)
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 7.52301e-10.

checkGradients successfully passed.
____________________________________________________________

____________________________________________________________

Nonlinear inequality constraint derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.44672e-11.

checkGradients successfully passed.
____________________________________________________________


Local minimum found that satisfies the constraints.

Optimization completed because the objective function is non-decreasing in 
feasible directions, to within the value of the optimality tolerance,
and constraints are satisfied to within the value of the constraint tolerance.

x =

    0.4987    0.4987


fval =

   10.4987

使用不正确的 unitdiskb 约束函数运行相同的脚本。请注意,脚本提前停止。

rng default
x0 = randn(1,2);
opts = optimoptions("fmincon",FiniteDifferenceType="central");
assert(checkGradients(@ras,x0,opts,Display="on"))
assert(all(checkGradients(@unitdiskb,x0,opts,...
    IsConstraint=true,Display="on")))
[x,fval] = fmincon(@ras,x0,[],[],[],[],[],[],@unitdiskb)
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 7.52301e-10.

checkGradients successfully passed.
____________________________________________________________

____________________________________________________________

Nonlinear inequality constraint derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.
  Supplied derivative element (2,1): 1.8324
  Finite-difference derivative element (2,1): 3.66479

checkGradients failed.

Supplied derivative and finite-difference approximation are
not within 'Tolerance' (1e-06).
____________________________________________________________

Error using assert
Assertion failed.

另请参阅

相关主题