检查梯度或雅可比矩阵的有效性
当您提供目标和非线性约束函数的一阶导数时,优化求解器通常可以计算得更快、更可靠。然而,在编写这些衍生产品的代码时很容易犯错误。为了防止出现错误的导数,请使用 checkGradients 函数检查编程导数的输出是否与有限差分近似一致。
检查目标函数的梯度
考虑目标拉斯特里金函数
以下函数正确计算了拉斯特里金函数及其梯度。
function [f,g] = ras(x) f = 20 + x(1)^2 + x(2)^2 - 10*cos(2*pi*x(1) + 2*pi*x(2)); if nargout > 1 g(2) = 2*x(2) + 20*pi*sin(2*pi*x(1) + 2*pi*x(2)); g(1) = 2*x(1) + 20*pi*sin(2*pi*x(1) + 2*pi*x(2)); end end
检查 ras 函数是否正确计算随机初始点附近的梯度。要查看计算的详细信息,请将 Display 名称值参数设置为 "on"。
rng default x0 = randn(1,2); valid = checkGradients(@ras,x0,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 7.32446e-08. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
badras 函数错误地计算了梯度。
function [f,g] = badras(x) f = 20 + x(1)^2 + x(2)^2 - 10*cos(2*pi*x(1) + 2*pi*x(2)); if nargout > 1 g(2) = 2*x(2) + 40*pi*sin(2*pi*x(1) + 2*pi*x(2)); g(1) = 2*x(1) + 40*pi*sin(2*pi*x(1) + 2*pi*x(2)); end end
checkGradients 正确报告 badras 函数错误地计算梯度。
valid = checkGradients(@badras,x0,Display="on")____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 0.494224. Supplied derivative element (1,1): 92.9841 Finite-difference derivative element (1,1): 47.0291 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ valid = logical 0
使用正确的梯度找到 ras(x) 的局部最小值。
rng default x0 = randn(1,2); options = optimoptions("fminunc",SpecifyObjectiveGradient=true); [x,fval,exitflag,output] = fminunc(@ras,x0,options)
Local minimum found.
Optimization completed because the size of the gradient is less than
the value of the optimality tolerance.
x =
0.9975 0.9975
fval =
11.9949
exitflag =
1
output =
struct with fields:
iterations: 9
funcCount: 13
stepsize: 5.7421e-05
lssteplength: 1
firstorderopt: 1.2426e-05
algorithm: 'quasi-newton'
message: 'Local minimum found.↵↵Optimization completed because the size of the gradient is less than↵the value of the optimality tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The first-order optimality measure, 2.482746e-07, is less ↵than options.OptimalityTolerance = 1.000000e-06.'求解器只需 9 次迭代和 13 次函数计算即可达到局部最小值。如果没有梯度,求解器将需要进行更多的函数计算。
[x,fval,exitflag,output] = fminunc(@ras,x0) % No optionsLocal minimum found.
...
output =
struct with fields:
iterations: 9
funcCount: 39
...这一次,求解器需要进行 39 次函数计算才能得出与之前基本相同的解。
检查向量目标函数的雅可比矩阵
lsqnonlin、lsqcurvefit 和 fsolve 函数使用向量值目标函数。(这些求解器试图最小化的目标函数是向量值目标的平方和。)雅可比矩阵是向量值目标函数的一阶导数。函数 J 有 F(x) 分量,其中变量 m 有 x 个分量,其雅可比矩阵 n 是一个 m×n 矩阵。J(i,j) 是 Fi(x) 关于 xj 的偏导数。
例如,vecfun 代码计算四个参数(代码中的 a、b、c 和 d)函数的梯度,其中变量数量取决于输入数据 t。t 向量对应一组时间,每个时间导致目标函数 F(x,t) 中的两个条目。
function [F,J] = vecfun(x,t) t = t(:); % Reshape t to a column vector a = x(1); b = x(2); c = x(3); d = x(4); nt = length(t); F = [a*exp(-b*t) + c,... c*exp(-d*t)]; % numel(F) = 2*nt if nargout > 1 J = zeros(2*nt,4); J(1:nt,1) = exp(-b*t); % First nt components corresponding to a*exp(-b*t) + c J(1:nt,2) = (-t.*a.*exp(-b*t)); J(1:nt,3) = ones(nt,1); J(nt+1:2*nt,3) = exp(-d*t); % Second nt components corresponding to c*exp(-d*t) J(nt+1:2*nt,4) = (-t.*c.*exp(-d*t)); end end
验证 vecfun 中的梯度计算。
t = [-1/2 1/2 1]; % Three times x = [1 2 3 4]; % Arbitrary x point valid = checkGradients(@(x)vecfun(x,t),x,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.51598e-08. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
在一组时间创建与 vecfun 函数相对应的人工响应数据。使用雅可比矩阵帮助将函数与数据拟合。
t = linspace(1,5); x0 = [1 2 3 4]; rng default x01 = rand(1,4); y = vecfun(x0,t); y = y + 0.1*randn(size(y)); % Add noise to response options = optimoptions("lsqcurvefit",SpecifyObjectiveGradient=true); [x,resnorm,~,exitflag,output] = lsqcurvefit(@vecfun,x01,t,y,[],[],options)
Local minimum possible.
lsqcurvefit stopped because the final change in the sum of squares relative to
its initial value is less than the value of the function tolerance.
x =
0.9575 1.4570 2.9894 3.7289
resnorm =
2.2076
exitflag =
3
output =
struct with fields:
firstorderopt: 1.0824e-04
iterations: 11
funcCount: 12
cgiterations: 0
algorithm: 'trust-region-reflective'
stepsize: 0.0111
message: 'Local minimum possible.↵↵lsqcurvefit stopped because the final change in the sum of squares relative to ↵its initial value is less than the value of the function tolerance.↵↵<stopping criteria details>↵↵Optimization stopped because the relative sum of squares (r) is changing↵by less than options.FunctionTolerance = 1.000000e-06.'
bestfeasible: []
constrviolation: []返回的解向量 [0.9575,1.4570,2.9894,3.7289] 接近原始向量 [1,2,3,4]。该求解器仅需 12 次函数计算。如果没有雅可比矩阵,求解器将进行更多次求值。
[x,resnorm,~,exitflag,output] = lsqcurvefit(@vecfun,x01,t,y) % No optionsLocal minimum possible.
...
output =
struct with fields:
firstorderopt: 1.0825e-04
iterations: 11
funcCount: 60
...这一次,求解器需要进行 60 次函数计算才能得出与之前基本相同的解。
修改有限差分选项和 checkGradients 参量
有时 checkGradients 可能会给出不正确的结果:
对于正确的梯度,
checkGradients可以给出无效检查的错误报告。通常,出现这种情况是因为该函数具有相对较大的二阶导数,因此有限差分估计不准确。由于Tolerancename-value 参量的值太小,也会出现误报。对于不正确的梯度,
checkGradients可能会给出有效检查的错误报告。通常,这种错误报告的出现是因为Tolerance名称值参量的值太大,或者因为结果包含巧合匹配的导数。
当您收到无效检查的错误报告时,为了更仔细地检查梯度,请更改有限差分选项。例如,checkGradients 错误地报告从点 ras 开始的 [-2,4] 梯度不正确。
valid = checkGradients(@ras,[-2,4],Display="on")____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.88497e-06. Supplied derivative element (1,2): 6.21515 Finite-difference derivative element (1,2): 6.21516 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ valid = logical 0
将 FiniteDifferenceType 选项设置为 "central" 并再次测试。
opts = optimoptions("fmincon",FiniteDifferenceType="central"); valid = checkGradients(@ras,[-2,4],opts,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.10182e-09. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
中心有限差分通常更准确。在这种情况下,计算的梯度和中心有限差分估计之间的相对差异约为 1e-9。默认的正向有限差分给出约 2e-6 的相对差异。
通过将容差从默认的 1e-6 放宽到 1e-5 来检查梯度的有效性。
valid = checkGradients(@ras,[-2,4],Tolerance=1e-5,Display="on")____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.88497e-06. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
在这种情况下,checkGradients 正确报告梯度是有效的。较宽松的容差不会导致 badras 函数通过检查。
valid = checkGradients(@badras,[-2,4],Tolerance=1e-5,Display="on")____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 0.400823. Supplied derivative element (1,2): 4.4368 Finite-difference derivative element (1,2): 6.21516 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-05). ____________________________________________________________ valid = logical 0
检查非线性约束的导数
unitdisk2 函数正确计算了约束函数及其梯度,以使 x 保持在半径为 1 的圆盘内。
function [c,ceq,gc,gceq] = unitdisk2(x) c = x(1)^2 + x(2)^2 - 1; ceq = [ ]; if nargout > 2 gc = [2*x(1);2*x(2)]; gceq = []; end end
unitdiskb 函数错误地计算了约束函数的梯度。
function [c,ceq,gc,gceq] = unitdiskb(x) c = x(1)^2 + x(2)^2 - 1; ceq = [ ]; if nargout > 2 gc = [x(1);x(2)]; % Gradient incorrect: off by a factor of 2 gceq = []; end end
将 IsConstraint 名称值参量设置为 true 并检查点 x0 = randn(1,2) 处的梯度。
rng default x0 = randn(1,2); valid = checkGradients(@unitdisk2,x0,IsConstraint=true,Display="on")
____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.53459e-08. checkGradients successfully passed. ____________________________________________________________ valid = 1×2 logical array 1 1
checkGradients 正确报告 unitdisk2 的梯度是有效的。
valid = checkGradients(@unitdiskb,x0,IsConstraint=true,Display="on")____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1. Supplied derivative element (2,1): 1.8324 Finite-difference derivative element (2,1): 3.66479 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ valid = 1×2 logical array 0 1
checkGradients 正确报告 unitdiskb 的梯度无效。
检查脚本中的渐变
当有限差分近似与相应的梯度函数不匹配时,您可以使用 checkGradients 函数来停止脚本。
rng default x0 = randn(1,2); opts = optimoptions("fmincon",FiniteDifferenceType="central"); assert(checkGradients(@ras,x0,opts,Display="on")) assert(all(checkGradients(@unitdisk2,x0,opts,... IsConstraint=true,Display="on"))) [x,fval] = fmincon(@ras,x0,[],[],[],[],[],[],@unitdisk2)
____________________________________________________________
Objective function derivatives:
Maximum relative difference between supplied
and finite-difference derivatives = 7.52301e-10.
checkGradients successfully passed.
____________________________________________________________
____________________________________________________________
Nonlinear inequality constraint derivatives:
Maximum relative difference between supplied
and finite-difference derivatives = 1.44672e-11.
checkGradients successfully passed.
____________________________________________________________
Local minimum found that satisfies the constraints.
Optimization completed because the objective function is non-decreasing in
feasible directions, to within the value of the optimality tolerance,
and constraints are satisfied to within the value of the constraint tolerance.
x =
0.4987 0.4987
fval =
10.4987使用不正确的 unitdiskb 约束函数运行相同的脚本。请注意,脚本提前停止。
rng default x0 = randn(1,2); opts = optimoptions("fmincon",FiniteDifferenceType="central"); assert(checkGradients(@ras,x0,opts,Display="on")) assert(all(checkGradients(@unitdiskb,x0,opts,... IsConstraint=true,Display="on"))) [x,fval] = fmincon(@ras,x0,[],[],[],[],[],[],@unitdiskb)
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 7.52301e-10. checkGradients successfully passed. ____________________________________________________________ ____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1. Supplied derivative element (2,1): 1.8324 Finite-difference derivative element (2,1): 3.66479 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ Error using assert Assertion failed.