检查梯度或雅可比矩阵的有效性
当您提供目标和非线性约束函数的一阶导数时,优化求解器通常可以计算得更快、更可靠。然而,在编写这些衍生产品的代码时很容易犯错误。为了防止出现错误的导数,请使用 checkGradients
函数检查编程导数的输出是否与有限差分近似一致。
检查目标函数的梯度
考虑目标拉斯特里金函数
以下函数正确计算了拉斯特里金函数及其梯度。
function [f,g] = ras(x) f = 20 + x(1)^2 + x(2)^2 - 10*cos(2*pi*x(1) + 2*pi*x(2)); if nargout > 1 g(2) = 2*x(2) + 20*pi*sin(2*pi*x(1) + 2*pi*x(2)); g(1) = 2*x(1) + 20*pi*sin(2*pi*x(1) + 2*pi*x(2)); end end
检查 ras
函数是否正确计算随机初始点附近的梯度。要查看计算的详细信息,请将 Display
名称值参数设置为 "on"
。
rng default x0 = randn(1,2); valid = checkGradients(@ras,x0,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 7.32446e-08. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
badras
函数错误地计算了梯度。
function [f,g] = badras(x) f = 20 + x(1)^2 + x(2)^2 - 10*cos(2*pi*x(1) + 2*pi*x(2)); if nargout > 1 g(2) = 2*x(2) + 40*pi*sin(2*pi*x(1) + 2*pi*x(2)); g(1) = 2*x(1) + 40*pi*sin(2*pi*x(1) + 2*pi*x(2)); end end
checkGradients
正确报告 badras
函数错误地计算梯度。
valid = checkGradients(@badras,x0,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 0.494224. Supplied derivative element (1,1): 92.9841 Finite-difference derivative element (1,1): 47.0291 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ valid = logical 0
使用正确的梯度找到 ras(x)
的局部最小值。
rng default x0 = randn(1,2); options = optimoptions("fminunc",SpecifyObjectiveGradient=true); [x,fval,exitflag,output] = fminunc(@ras,x0,options)
Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance. x = 0.9975 0.9975 fval = 11.9949 exitflag = 1 output = struct with fields: iterations: 9 funcCount: 13 stepsize: 5.7421e-05 lssteplength: 1 firstorderopt: 1.2426e-05 algorithm: 'quasi-newton' message: 'Local minimum found.↵↵Optimization completed because the size of the gradient is less than↵the value of the optimality tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The first-order optimality measure, 2.482746e-07, is less ↵than options.OptimalityTolerance = 1.000000e-06.'
求解器只需 9 次迭代和 13 次函数计算即可达到局部最小值。如果没有梯度,求解器将需要进行更多的函数计算。
[x,fval,exitflag,output] = fminunc(@ras,x0) % No options
Local minimum found. ... output = struct with fields: iterations: 9 funcCount: 39 ...
这一次,求解器需要进行 39 次函数计算才能得出与之前基本相同的解。
检查向量目标函数的雅可比矩阵
lsqnonlin
、lsqcurvefit
和 fsolve
函数使用向量值目标函数。(这些求解器试图最小化的目标函数是向量值目标的平方和。)雅可比矩阵是向量值目标函数的一阶导数。函数 F(x)
有 m
分量,其中变量 x
有 n
个分量,其雅可比矩阵 J
是一个 m
×n
矩阵。J(i,j)
是 Fi(x)
关于 xj
的偏导数。
例如,vecfun
代码计算四个参数(代码中的 a
、b
、c
和 d
)函数的梯度,其中变量数量取决于输入数据 t
。t
向量对应一组时间,每个时间导致目标函数 F(x,t)
中的两个条目。
function [F,J] = vecfun(x,t) t = t(:); % Reshape t to a column vector a = x(1); b = x(2); c = x(3); d = x(4); nt = length(t); F = [a*exp(-b*t) + c,... c*exp(-d*t)]; % numel(F) = 2*nt if nargout > 1 J = zeros(2*nt,4); J(1:nt,1) = exp(-b*t); % First nt components corresponding to a*exp(-b*t) + c J(1:nt,2) = (-t.*a.*exp(-b*t)); J(1:nt,3) = ones(nt,1); J(nt+1:2*nt,3) = exp(-d*t); % Second nt components corresponding to c*exp(-d*t) J(nt+1:2*nt,4) = (-t.*c.*exp(-d*t)); end end
验证 vecfun
中的梯度计算。
t = [-1/2 1/2 1]; % Three times x = [1 2 3 4]; % Arbitrary x point valid = checkGradients(@(x)vecfun(x,t),x,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.51598e-08. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
在一组时间创建与 vecfun
函数相对应的人工响应数据。使用雅可比矩阵帮助将函数与数据拟合。
t = linspace(1,5); x0 = [1 2 3 4]; rng default x01 = rand(1,4); y = vecfun(x0,t); y = y + 0.1*randn(size(y)); % Add noise to response options = optimoptions("lsqcurvefit",SpecifyObjectiveGradient=true); [x,resnorm,~,exitflag,output] = lsqcurvefit(@vecfun,x01,t,y,[],[],options)
Local minimum possible. lsqcurvefit stopped because the final change in the sum of squares relative to its initial value is less than the value of the function tolerance. x = 0.9575 1.4570 2.9894 3.7289 resnorm = 2.2076 exitflag = 3 output = struct with fields: firstorderopt: 1.0824e-04 iterations: 11 funcCount: 12 cgiterations: 0 algorithm: 'trust-region-reflective' stepsize: 0.0111 message: 'Local minimum possible.↵↵lsqcurvefit stopped because the final change in the sum of squares relative to ↵its initial value is less than the value of the function tolerance.↵↵<stopping criteria details>↵↵Optimization stopped because the relative sum of squares (r) is changing↵by less than options.FunctionTolerance = 1.000000e-06.' bestfeasible: [] constrviolation: []
返回的解向量 [0.9575,1.4570,2.9894,3.7289]
接近原始向量 [1,2,3,4]
。该求解器仅需 12 次函数计算。如果没有雅可比矩阵,求解器将进行更多次求值。
[x,resnorm,~,exitflag,output] = lsqcurvefit(@vecfun,x01,t,y) % No options
Local minimum possible. ... output = struct with fields: firstorderopt: 1.0825e-04 iterations: 11 funcCount: 60 ...
这一次,求解器需要进行 60 次函数计算才能得出与之前基本相同的解。
修改有限差分选项和 checkGradients
参数
有时 checkGradients
可能会给出不正确的结果:
对于正确的梯度,
checkGradients
可以给出无效检查的错误报告。通常,出现这种情况是因为该函数具有相对较大的二阶导数,因此有限差分估计不准确。由于Tolerance
name-value 参量的值太小,也会出现误报。对于不正确的梯度,
checkGradients
可能会给出有效检查的错误报告。通常,这种错误报告的出现是因为Tolerance
名称值参量的值太大,或者因为结果包含巧合匹配的导数。
当您收到无效检查的错误报告时,为了更仔细地检查梯度,请更改有限差分选项。例如,checkGradients
错误地报告从点 [-2,4]
开始的 ras
梯度不正确。
valid = checkGradients(@ras,[-2,4],Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.88497e-06. Supplied derivative element (1,2): 6.21515 Finite-difference derivative element (1,2): 6.21516 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ valid = logical 0
将 FiniteDifferenceType
选项设置为 "central"
并再次测试。
opts = optimoptions("fmincon",FiniteDifferenceType="central"); valid = checkGradients(@ras,[-2,4],opts,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.10182e-09. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
中心有限差分通常更准确。在这种情况下,计算的梯度和中心有限差分估计之间的相对差异约为 1e-9。默认的正向有限差分给出约 2e-6 的相对差异。
通过将容差从默认的 1e-6 放宽到 1e-5 来检查梯度的有效性。
valid = checkGradients(@ras,[-2,4],Tolerance=1e-5,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.88497e-06. checkGradients successfully passed. ____________________________________________________________ valid = logical 1
在这种情况下,checkGradients
正确报告梯度是有效的。较宽松的容差不会导致 badras
函数通过检查。
valid = checkGradients(@badras,[-2,4],Tolerance=1e-5,Display="on")
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 0.400823. Supplied derivative element (1,2): 4.4368 Finite-difference derivative element (1,2): 6.21516 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-05). ____________________________________________________________ valid = logical 0
检查非线性约束的导数
unitdisk2
函数正确计算了约束函数及其梯度,以使 x
保持在半径为 1 的圆盘内。
function [c,ceq,gc,gceq] = unitdisk2(x) c = x(1)^2 + x(2)^2 - 1; ceq = [ ]; if nargout > 2 gc = [2*x(1);2*x(2)]; gceq = []; end end
unitdiskb
函数错误地计算了约束函数的梯度。
function [c,ceq,gc,gceq] = unitdiskb(x) c = x(1)^2 + x(2)^2 - 1; ceq = [ ]; if nargout > 2 gc = [x(1);x(2)]; % Gradient incorrect: off by a factor of 2 gceq = []; end end
将 IsConstraint
名称值参量设置为 true
并检查点 x0 = randn(1,2)
处的梯度。
rng default x0 = randn(1,2); valid = checkGradients(@unitdisk2,x0,IsConstraint=true,Display="on")
____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.53459e-08. checkGradients successfully passed. ____________________________________________________________ valid = 1×2 logical array 1 1
checkGradients
正确报告 unitdisk2
的梯度是有效的。
valid = checkGradients(@unitdiskb,x0,IsConstraint=true,Display="on")
____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1. Supplied derivative element (2,1): 1.8324 Finite-difference derivative element (2,1): 3.66479 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ valid = 1×2 logical array 0 1
checkGradients
正确报告 unitdiskb
的梯度无效。
检查脚本中的渐变
当有限差分近似与相应的梯度函数不匹配时,您可以使用 checkGradients
函数来停止脚本。
rng default x0 = randn(1,2); opts = optimoptions("fmincon",FiniteDifferenceType="central"); assert(checkGradients(@ras,x0,opts,Display="on")) assert(all(checkGradients(@unitdisk2,x0,opts,... IsConstraint=true,Display="on"))) [x,fval] = fmincon(@ras,x0,[],[],[],[],[],[],@unitdisk2)
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 7.52301e-10. checkGradients successfully passed. ____________________________________________________________ ____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.44672e-11. checkGradients successfully passed. ____________________________________________________________ Local minimum found that satisfies the constraints. Optimization completed because the objective function is non-decreasing in feasible directions, to within the value of the optimality tolerance, and constraints are satisfied to within the value of the constraint tolerance. x = 0.4987 0.4987 fval = 10.4987
使用不正确的 unitdiskb
约束函数运行相同的脚本。请注意,脚本提前停止。
rng default x0 = randn(1,2); opts = optimoptions("fmincon",FiniteDifferenceType="central"); assert(checkGradients(@ras,x0,opts,Display="on")) assert(all(checkGradients(@unitdiskb,x0,opts,... IsConstraint=true,Display="on"))) [x,fval] = fmincon(@ras,x0,[],[],[],[],[],[],@unitdiskb)
____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 7.52301e-10. checkGradients successfully passed. ____________________________________________________________ ____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1. Supplied derivative element (2,1): 1.8324 Finite-difference derivative element (2,1): 3.66479 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ Error using assert Assertion failed.