fmincon optimization: is the first order optimality very sensititve to changes in the step tolerance?

11 次查看（过去 30 天）

显示更早的评论

SA-W 2023-11-27

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2052547-fmincon-optimization-is-the-first-order-optimality-very-sensititve-to-changes-in-the-step-tolerance

编辑： Matt J 2023-11-28

采纳的回答： Matt J

在 MATLAB Online 中打开

I use fmincon interior-point algorithm to fit parameters to a pde.

Here are my basic settings:

opts = optimoptions('fmincon', ...
    'StepTolerance', 1e-12, ...
    'FunctionTolerance', 1e-12, ...
    'OptimalityTolerance', 1e-12, ...
    'MaxIterations', 250,...
    'SpecifyObjectiveGradient', true, ...
    'CheckGradients', false);
lb = zeros(9,1);
ub = 4 + zeros(9,1);
Aineq = ... ; % entries have dimension 1e2
bineq = zeros(9,1);
problem = createOptimProblem( ...
    params.solverName, ...
    'objective', myFun, ...
    'x0', startVec, ...
    'lb',lb, ...
    'ub',ub, ...
    'Aineq', Aineq,
    'bineq', bineq,
    options=opts);
%create multistart object
ms = MultiStart('Display', 'iter', ...
    'UseParallel', true, ...
    'StartPointsToRun', 'all', ...
    'FunctionTolerance', 0); 
% run
run(ms, problem, myStartPoints)

Thre are nine parameters and I have lower bounds and upper bounds as well as linear inequality constraints.

I scaled the matrix Aineq by 1e2 manually such that fmincon pays more attention to feasability. I am aware that this comes with poor convergence and other drawbacks, but it proved to work quite well so far. The reason to choose those tight tolerances (1e-12) is to work around flat regions of the objective function, if any.

Using these options, I get the following output from multistart:

The solution of all 10 runs is

x = [0 0.00838947 0.0167789 0.0251684 0.0335579 0.0419473 0.0503368 0.0587263 0.0673571]

All solutions have exitflag=2 (probably because of the brutal scaling) and the same value of the objective function. Also, the first-order optimality is small.

However, run index = 4, for instance, converged to the same solution, but the first-order optimality is rather big compared to the other ones.

This gets even more visible if I relax all my tolerances (step, function, first-order-optimality tolerance) to the default value of 1e-6:

The solutions is nearly the same as before

x = [0 0.00838945 0.0167789 0.0251684 0.0335578 0.0419473 0.0503367 0.0587262 0.067357]

however, the first-order optimality is higher by several orders of magnitudes, but the solution is nearly unchanged which is also indicated by the sum of squares.

Those high optimalities make the solution less trustworthy.

How is it possible that the optimalities are so different if the sum of squares as well as the solution are practically identical?

7 个评论
显示 5更早的评论隐藏 5更早的评论

SA-W 2023-11-27

The gradient check works. There is probably not much more that I can do.

Torsten 2023-11-27

I suggest you compute the objective function near the point that MATLAB computes as optimal by changing each parameter separately while holding the other parameters constant and see what kind of curves you get.

请先登录，再进行评论。

请先登录，再回答此问题。

采纳的回答

Matt J 2023-11-27

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2052547-fmincon-optimization-is-the-first-order-optimality-very-sensititve-to-changes-in-the-step-tolerance#answer_1360812

编辑：Matt J 2023-11-27

So given the results I show, can we qualititatively say that the objective is likely to be very flat at the solution? Or something else?

Well, it basically means that a small change in x (near the stopping point) produces a large change in the gradient. The function would seem to have very high curvatures there, or possibly has a discontinuous first derivative.

25 个评论
显示 23更早的评论隐藏 23更早的评论

Matt J 2023-11-27

编辑：Matt J 2023-11-27

在 MATLAB Online 中打开

Here's a qualitatively similar example. You can see the first order optimalities are again very different without changing the solution very much.

for  tol=[1e-12,1e-6]
    opts = optimoptions('fmincon', ...
        'StepTolerance', tol, ...
        'FunctionTolerance', tol, ...
        'OptimalityTolerance', tol, ...
        'MaxIterations', inf,'MaxFunctionEvaluations', inf,...
        'SpecifyObjectiveGradient', false, ...
        'CheckGradients', false,'Display','none');
    
    f=@(z)sum( z.^2.*(z<0)+z.*(z>=0));
    
    [x,fval,eflag,stats]=fmincon(@(x) f(x-[1,2,3])+3,rand(1,3)-0.5,[],[],[],[],[],[],[],opts)
end
x = 1×3
    1.0000    2.0000    3.0000
fval = 3.0000
eflag = 2
stats = struct with fields:
         iterations: 38
          funcCount: 244
    constrviolation: 0
           stepsize: 1.6412e-12
          algorithm: 'interior-point'
      firstorderopt: 0.7603
       cgiterations: 47
            message: 'Local minimum possible. Constraints satisfied.↵↵fmincon stopped because the size of the current step is less than↵the value of the step size tolerance and constraints are ↵satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization stopped because the relative changes in all elements of x are↵less than options.StepTolerance = 1.000000e-12, and the relative maximum constraint↵violation, 0.000000e+00, is less than options.ConstraintTolerance = 1.000000e-06.'
       bestfeasible: [1×1 struct]
x = 1×3
    1.0000    2.0000    3.0000
fval = 3.0000
eflag = 2
stats = struct with fields:
         iterations: 23
          funcCount: 147
    constrviolation: 0
           stepsize: 1.0967e-06
          algorithm: 'interior-point'
      firstorderopt: 8.0766e-05
       cgiterations: 33
            message: 'Local minimum possible. Constraints satisfied.↵↵fmincon stopped because the size of the current step is less than↵the value of the step size tolerance and constraints are ↵satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization stopped because the relative changes in all elements of x are↵less than options.StepTolerance = 1.000000e-06, and the relative maximum constraint↵violation, 0.000000e+00, is less than options.ConstraintTolerance = 1.000000e-06.'
       bestfeasible: [1×1 struct]

SA-W 2023-11-27

在 MATLAB Online 中打开

You would compute the Hessian and see if it has any large entries or large eigenvalues.

hessian =
1.0e+07 *
0.0546    0.0014   -0.0286   -0.0145   -0.0104   -0.0072   -0.0181    0.0244   -0.0016
0.0014    0.0042    0.0019    0.0014    0.0009    0.0018    0.0041   -0.0150   -0.0007
-0.0286    0.0019    0.0166    0.0086    0.0063    0.0046    0.0138   -0.0253    0.0021
-0.0145    0.0014    0.0086    0.0052    0.0016    0.0106   -0.0200    0.0355   -0.0284
-0.0104    0.0009    0.0063    0.0016    0.0064   -0.0186    0.0739   -0.1343    0.0742
-0.0072    0.0018    0.0046    0.0106   -0.0186    0.1034   -0.3381    0.6104   -0.3670
-0.0181    0.0041    0.0138   -0.0200    0.0739   -0.3381    1.1674   -2.1213    1.2382
0.0244   -0.0150   -0.0253    0.0355   -0.1343    0.6104   -2.1213    3.8685   -2.2429
-0.0016   -0.0007    0.0021   -0.0284    0.0742   -0.3670    1.2382   -2.2429    1.3263

This is the hessian for the solution I mentioned in my question. The entries of the hessian differ only by one order of magnitude.

The smallest eigenvalue is 0.0939 and the largest 6.5e+7. This makes the condition number of the hessian large, but this has probably to do with the units of my parameters.

The function would seem to have very high curvatures there,

Do you think this could be the case here? Admittedly, I have never looked at the Hessian so far but, compared to the parameters (between 0 and 1), the second derivatives are very large.

Matt J 2023-11-28

编辑：Matt J 2023-11-28

在 MATLAB Online 中打开

Did you mean ~1e-6*1e7 = ~1e+1?

Yes, that's what I really meant.

More important from a practical viewpoint: If I know my objective has large curvatures, is it reasonable to choose a smaller step size?

You mean a smaller StepTolerance? It can definitely make a difference. Here's an example, where we make the curvatures both small p=2 and large p=7, and you can see that, for the latter, the tolerance makes a big difference in how close you come to the true solution at x=0:

x0=[787.5362  617.2179  596.9010  784.4406   5.9046];
for  p=[2,7]
    p,
 for tol=[1e-6,1e-12];
    opts = optimoptions('fmincon', ...
        'StepTolerance', tol, ...
        'FunctionTolerance', tol, ...
        'OptimalityTolerance', tol, ...
        'MaxIterations', inf,'MaxFunctionEvaluations', inf,...
        'SpecifyObjectiveGradient', false, ...
        'CheckGradients', false,'Display','none');
    
    [x,fval,eflag,stats]=fmincon(@(x) norm(x+5,p)^p,x0,...
                                 [],[],[],[],[0,0,0,0,0],[],[],opts);
    distance_to_true_solution = norm(x)
 end
end
p = 2
distance_to_true_solution = 4.4737e-07
distance_to_true_solution = 4.4723e-09
p = 7
distance_to_true_solution = 0.5628
distance_to_true_solution = 8.1778e-14

SA-W 2023-11-28

It must be the Hessian of the Lagrangian, not the objective function, although I guess if you only have linear constraints, they will be the same thing.

Yes, I think so too.

Do you think it makes sense to calculate correlations,etc,... with a hessian that has a condition number ~1e7?

Matt J 2023-11-28

编辑：Matt J 2023-11-28

No idea. You can try using pinv(Hessian) instead of inv().

请先登录，再进行评论。

类别

Mathematics and Optimization Optimization Toolbox Optimization Results Solver Outputs and Iterative Display

在 Help Center 和 File Exchange 中查找有关 Solver Outputs and Iterative Display 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

fmincon optimization: is the first order optimality very sensititve to changes in the step tolerance?

7 个评论
显示 5更早的评论隐藏 5更早的评论

采纳的回答

25 个评论
显示 23更早的评论隐藏 23更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

fmincon optimization: is the first order optimality very sensititve to changes in the step tolerance?

7 个评论 显示 5更早的评论隐藏 5更早的评论

采纳的回答

25 个评论 显示 23更早的评论隐藏 23更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

7 个评论
显示 5更早的评论隐藏 5更早的评论

25 个评论
显示 23更早的评论隐藏 23更早的评论