fmincon optimization: is the first order optimality very sensititve to changes in the step tolerance?
11 次查看(过去 30 天)
显示 更早的评论
SA-W
2023-11-27
I use fmincon interior-point algorithm to fit parameters to a pde.
Here are my basic settings:
opts = optimoptions('fmincon', ...
'StepTolerance', 1e-12, ...
'FunctionTolerance', 1e-12, ...
'OptimalityTolerance', 1e-12, ...
'MaxIterations', 250,...
'SpecifyObjectiveGradient', true, ...
'CheckGradients', false);
lb = zeros(9,1);
ub = 4 + zeros(9,1);
Aineq = ... ; % entries have dimension 1e2
bineq = zeros(9,1);
problem = createOptimProblem( ...
params.solverName, ...
'objective', myFun, ...
'x0', startVec, ...
'lb',lb, ...
'ub',ub, ...
'Aineq', Aineq,
'bineq', bineq,
options=opts);
%create multistart object
ms = MultiStart('Display', 'iter', ...
'UseParallel', true, ...
'StartPointsToRun', 'all', ...
'FunctionTolerance', 0);
% run
run(ms, problem, myStartPoints)
Thre are nine parameters and I have lower bounds and upper bounds as well as linear inequality constraints.
I scaled the matrix Aineq by 1e2 manually such that fmincon pays more attention to feasability. I am aware that this comes with poor convergence and other drawbacks, but it proved to work quite well so far. The reason to choose those tight tolerances (1e-12) is to work around flat regions of the objective function, if any.
Using these options, I get the following output from multistart:
The solution of all 10 runs is
x = [0 0.00838947 0.0167789 0.0251684 0.0335579 0.0419473 0.0503368 0.0587263 0.0673571]
All solutions have exitflag=2 (probably because of the brutal scaling) and the same value of the objective function. Also, the first-order optimality is small.
However, run index = 4, for instance, converged to the same solution, but the first-order optimality is rather big compared to the other ones.
This gets even more visible if I relax all my tolerances (step, function, first-order-optimality tolerance) to the default value of 1e-6:
The solutions is nearly the same as before
x = [0 0.00838945 0.0167789 0.0251684 0.0335578 0.0419473 0.0503367 0.0587262 0.067357]
however, the first-order optimality is higher by several orders of magnitudes, but the solution is nearly unchanged which is also indicated by the sum of squares.
Those high optimalities make the solution less trustworthy.
How is it possible that the optimalities are so different if the sum of squares as well as the solution are practically identical?
7 个评论
Harald
2023-11-27
Hi,
I suppose you are solving the PDE numerically? In that case, please consider
This explains that you may also want to set FiniteDifferenceStepSize and the reasons behind doing so. Perhaps, this will also help with your original question.
Best wishes,
Harald
SA-W
2023-11-27
Hello,
yes, I solve the pde numerically but I provide an analytical gradient to fmincon (see my options). Not sure what happens with the hessian, but for the gradient, there is no finite-differencing.
SA-W
2023-11-27
I know. I am just looking for a qualitative assessment.
So given the results I show, can we qualititatively say that the objective is likely to be very flat at the solution? Or something else?
Torsten
2023-11-27
I suggest you compute the objective function near the point that MATLAB computes as optimal by changing each parameter separately while holding the other parameters constant and see what kind of curves you get.
采纳的回答
Matt J
2023-11-27
编辑:Matt J
2023-11-27
So given the results I show, can we qualititatively say that the objective is likely to be very flat at the solution? Or something else?
Well, it basically means that a small change in x (near the stopping point) produces a large change in the gradient. The function would seem to have very high curvatures there, or possibly has a discontinuous first derivative.
25 个评论
SA-W
2023-11-27
All linear constraints are active at the solution and they are continously differentiable. Also, I think the objective function is continuously differentiable near the solution.
The function would seem to have very high curvatures there
How would you proceed to figure this out? The parameter space is nine-dimensional and visualizing anything is probably hard.
Matt J
2023-11-27
编辑:Matt J
2023-11-27
Here's a qualitatively similar example. You can see the first order optimalities are again very different without changing the solution very much.
for tol=[1e-12,1e-6]
opts = optimoptions('fmincon', ...
'StepTolerance', tol, ...
'FunctionTolerance', tol, ...
'OptimalityTolerance', tol, ...
'MaxIterations', inf,'MaxFunctionEvaluations', inf,...
'SpecifyObjectiveGradient', false, ...
'CheckGradients', false,'Display','none');
f=@(z)sum( z.^2.*(z<0)+z.*(z>=0));
[x,fval,eflag,stats]=fmincon(@(x) f(x-[1,2,3])+3,rand(1,3)-0.5,[],[],[],[],[],[],[],opts)
end
x = 1×3
1.0000 2.0000 3.0000
fval = 3.0000
eflag = 2
stats = struct with fields:
iterations: 38
funcCount: 244
constrviolation: 0
stepsize: 1.6412e-12
algorithm: 'interior-point'
firstorderopt: 0.7603
cgiterations: 47
message: 'Local minimum possible. Constraints satisfied.↵↵fmincon stopped because the size of the current step is less than↵the value of the step size tolerance and constraints are ↵satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization stopped because the relative changes in all elements of x are↵less than options.StepTolerance = 1.000000e-12, and the relative maximum constraint↵violation, 0.000000e+00, is less than options.ConstraintTolerance = 1.000000e-06.'
bestfeasible: [1×1 struct]
x = 1×3
1.0000 2.0000 3.0000
fval = 3.0000
eflag = 2
stats = struct with fields:
iterations: 23
funcCount: 147
constrviolation: 0
stepsize: 1.0967e-06
algorithm: 'interior-point'
firstorderopt: 8.0766e-05
cgiterations: 33
message: 'Local minimum possible. Constraints satisfied.↵↵fmincon stopped because the size of the current step is less than↵the value of the step size tolerance and constraints are ↵satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization stopped because the relative changes in all elements of x are↵less than options.StepTolerance = 1.000000e-06, and the relative maximum constraint↵violation, 0.000000e+00, is less than options.ConstraintTolerance = 1.000000e-06.'
bestfeasible: [1×1 struct]
Matt J
2023-11-27
How would you proceed to figure this out?
You would compute the Hessian and see if it has any large entries or large eigenvalues.
SA-W
2023-11-27
You can see the first order optimalities are again very different without changing the solution very much.
Why is this the case here? The function is very smooth and the first derivative is continuous, right?
SA-W
2023-11-27
The discontinuity did not cause any problems in the optimization (same result). That said, it may not be a problem in practice? I mean as long as the converged solution is the same, I do not have to worry about the discontinuity.
Matt J
2023-11-27
The interior point algorithm theoretically assumes a twice continuously differentiable function and constraints, but much of the time it does work when the continuity is only piecewise.
SA-W
2023-11-27
You would compute the Hessian and see if it has any large entries or large eigenvalues.
hessian =
1.0e+07 *
0.0546 0.0014 -0.0286 -0.0145 -0.0104 -0.0072 -0.0181 0.0244 -0.0016
0.0014 0.0042 0.0019 0.0014 0.0009 0.0018 0.0041 -0.0150 -0.0007
-0.0286 0.0019 0.0166 0.0086 0.0063 0.0046 0.0138 -0.0253 0.0021
-0.0145 0.0014 0.0086 0.0052 0.0016 0.0106 -0.0200 0.0355 -0.0284
-0.0104 0.0009 0.0063 0.0016 0.0064 -0.0186 0.0739 -0.1343 0.0742
-0.0072 0.0018 0.0046 0.0106 -0.0186 0.1034 -0.3381 0.6104 -0.3670
-0.0181 0.0041 0.0138 -0.0200 0.0739 -0.3381 1.1674 -2.1213 1.2382
0.0244 -0.0150 -0.0253 0.0355 -0.1343 0.6104 -2.1213 3.8685 -2.2429
-0.0016 -0.0007 0.0021 -0.0284 0.0742 -0.3670 1.2382 -2.2429 1.3263
This is the hessian for the solution I mentioned in my question. The entries of the hessian differ only by one order of magnitude.
The smallest eigenvalue is 0.0939 and the largest 6.5e+7. This makes the condition number of the hessian large, but this has probably to do with the units of my parameters.
The function would seem to have very high curvatures there,
Do you think this could be the case here? Admittedly, I have never looked at the Hessian so far but, compared to the parameters (between 0 and 1), the second derivatives are very large.
Matt J
2023-11-27
编辑:Matt J
2023-11-27
The curvatures don't seem large enough to account for the difference we see. However, looking at the hessian won't detect a discontinuous first derivative. As illustrated below, the Hessian explodes only if you take the Hessian right at the point of discontinuity.
f=@(z)z.^2.*(z<0)+z.*(z>=0);
x=linspace(-2,2,1e6);
dx=x(2)-x(1);
Hess=gradient(gradient(f(x),dx),dx);
plot(x,Hess,'.-'); xlabel x; ylabel('\nabla^2f(x))')
SA-W
2023-11-27
The curvatures don't seem large enough to account for the difference we see.
They are in the order 1e7. Is that not big? Or what was your criterion to come to that statement?
Matt J
2023-11-27
编辑:Matt J
2023-11-27
I take it back. It probably is big enough. The changes in x you are seeing are ~1e-6, so with 2nd derivatives that are ~1e7, then you can see changes in the first derivatives of ~1e-6*1e7 = ~1e-1, which is largely what you are seeing, assuming the first order optimality numbers are comparable to the gradient magnitudes at the solution.
SA-W
2023-11-27
The changes in x you are seeing are ~1e-6, so with 2nd derivatives that are ~1e7, then you can see changes in the first derivatives of ~1e-6*1e7 = ~1e-1, which is largely what you are seeing,
Maybe I miss something. The changes in the first-order optimality are at least ~1e5 (they reduced from 1e-5 to 1e0, roughly). So what did you mean by
~1e-1, which is largely what you are seeing
?
Matt J
2023-11-27
编辑:Matt J
2023-11-27
they reduced from 1e-5 to 1e0, roughly
To me, it looks like the average FOOM across the different multistarts is initially about 1e-6 and then, after relaxing the tolerances, the average is about 1e-1. Therefore, the change is,
change = 1e-1 - 1e-6
change = 0.1000
which is still ~1e-1.
SA-W
2023-11-27
The changes in x you are seeing are ~1e-6, so with 2nd derivatives that are ~1e7, then you can see changes in the first derivatives of ~1e-6*1e7 = ~1e-1, which is largely what you are seeing,
Did you mean ~1e-6*1e7 = ~1e+1? Of course, exact numbers are nonsense here, but just to make sure you did not refer to something else with ~1e-1.
More important from a practical viewpoint: If I know my objective has large curvatures, is it reasonable to choose a smaller step size? The only benefit I see from this is that the first-order optimality is closer to zero which makes the results more trustworthy but the effect on the solution is negligible.
Matt J
2023-11-28
编辑:Matt J
2023-11-28
Did you mean ~1e-6*1e7 = ~1e+1?
Yes, that's what I really meant.
More important from a practical viewpoint: If I know my objective has large curvatures, is it reasonable to choose a smaller step size?
You mean a smaller StepTolerance? It can definitely make a difference. Here's an example, where we make the curvatures both small p=2 and large p=7, and you can see that, for the latter, the tolerance makes a big difference in how close you come to the true solution at x=0:
x0=[787.5362 617.2179 596.9010 784.4406 5.9046];
for p=[2,7]
p,
for tol=[1e-6,1e-12];
opts = optimoptions('fmincon', ...
'StepTolerance', tol, ...
'FunctionTolerance', tol, ...
'OptimalityTolerance', tol, ...
'MaxIterations', inf,'MaxFunctionEvaluations', inf,...
'SpecifyObjectiveGradient', false, ...
'CheckGradients', false,'Display','none');
[x,fval,eflag,stats]=fmincon(@(x) norm(x+5,p)^p,x0,...
[],[],[],[],[0,0,0,0,0],[],[],opts);
distance_to_true_solution = norm(x)
end
end
p = 2
distance_to_true_solution = 4.4737e-07
distance_to_true_solution = 4.4723e-09
p = 7
distance_to_true_solution = 0.5628
distance_to_true_solution = 8.1778e-14
SA-W
2023-11-28
编辑:SA-W
2023-11-28
Here's an example, where we make the curvatures both small p=2 and large p=7, and you can see that, for the latter, the tolerance makes a big difference in how close you come to the true solution at x=0:
What we see here is what I expected for small curvatures (flat objectives) too. There, the gradients are small and the optimizer may think it arrived at a minimum and stops to early. Increasing the step tolerance may helps to approach closer to the minimum.
From that, can we say that a small step tolerance makes sense if I have no idea about the dimension of second derivatives? Small step tolerance leads to more iterations, but lets forget about that.
Matt J
2023-11-28
编辑:Matt J
2023-11-28
There, the gradients are small and the optimizer may think it arrived at a minimum and stops to early. Increasing the step tolerance may helps to approach closer to the minimum.
No, I don't think so. In that situation, you would want to tighten the optimality tolerance and maybe also the function tolerance.
From that, can we say that a small step tolerance makes sense if I have no idea about the dimension of second derivatives?
Yes, but it may not be enough. If you have no idea about the second derivatives and they happen to be very low, you may need to make the optimality tolerance extra-small as well (see above). Generally speaking, tightening all of the tolerances always reduces the risk of premature stopping.
SA-W
2023-11-28
Yes, but it may not be enough. If you have very low curvatures, you may need to make the optimality tolerance extra-small as well (see above). Generally speaking, tightening all of the tolerances always reduces the risk of premature stopping.
That said, would you say setting all tolerances to 1e-12 is reasonable? My parameters are all greater zero and less than 5, I expect the sum of squares to be between 1 and 100.
Matt J
2023-11-28
编辑:Matt J
2023-11-28
1e-12 is what I usually use. But you really can't look to the stopping tolerances and exit statistics alone to gain confidence in an optimization's performance. You should be testing on several simulated versions of the problem where you know the desired answer, and seeing how close the optimizer gets.
SA-W
2023-11-28
I always perform numerical experiments first where I know the exact solution, and perform a multistart simulation with several start points. If this works, I move on to the real experiment and perform again a multistart run.
My sanity check is to verify that all local solver runs converged to the same solution. Since I do not know the exact solution, I can not do much more. Or can you recommend some nice checks here? I mean if you get a solution from optimiztion where you do not know the exact solution, how do you convince yourself that the solution is correct if the exitflags are not trustworthy enough?
Matt J
2023-11-28
I don't do anything. I rely on experience with the numerical tests for my confidence.
SA-W
2023-11-28
编辑:SA-W
2023-11-28
Do you know tools (file exchange, toolbox functions,...) to postprocess optimization results?
Often, people calculate confidence intervals, R-coefficients, correlation matrices,... Of course, one could implement this self, but in case there are already enclosed packages, why not using them.
Matt J
2023-11-28
编辑:Matt J
2023-11-28
There is this thread,
As you will see in the comments though, it will probably require you to do your own Hessian calculation. It must be the Hessian of the Lagrangian, not the objective function, although I guess if you only have linear constraints, they will be the same thing.
SA-W
2023-11-28
It must be the Hessian of the Lagrangian, not the objective function, although I guess if you only have linear constraints, they will be the same thing.
Yes, I think so too.
Do you think it makes sense to calculate correlations,etc,... with a hessian that has a condition number ~1e7?
更多回答(0 个)
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!发生错误
由于页面发生更改,无法完成操作。请重新加载页面以查看其更新后的状态。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
亚太
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)