fmincon: less optimal results with user supplied analytic gradient?

Question

Mathias 2013-10-13

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/90090-fmincon-less-optimal-results-with-user-supplied-analytic-gradient

评论： Matt J 2013-10-15

Hi there, I have a question concerning the fmincon optimization procedure in generel: I am minimizing a very complicated (unfortunately not necessarily convex and continuous) functional using the fmincon, currently along with the interior-point-algorithm.

Until some days ago, I used finite differencing to estimate the gradients. Since I derived an analytic expression for the functional, I went through the tedious work of deriving an analytic expression for the gradient, mainly to accelerate the optimization. In fact, the computation is significantly faster, but the final function value is bigger, i.e. less optimal, which I don't understand. I already compared the finite-differencing and analytic gradients, but this does not seem to be the problem (accuracy ~ 1e-6).

Do you know if there are some parameters, I should particularly adjust to tune the algorithm towards using analytic gradients, or can you imaging, where the problem might be? Thanks a lot for your help. And by the way: thanks a lot to the community in general. Although this is my first active participation, I have been extensively and successfully using the forums for years... :-)

Best Mathias

4 个评论
显示 2更早的评论隐藏 2更早的评论

Mathias 2013-10-13

编辑：Mathias 2013-10-13

在 MATLAB Online 中打开

Hey, thanks for the reply,

the bad continuity is unfortunately problem specific, but its not that a big deal. Using finite differencing, I was already able to get almost perfect results, and the function only becomes badly conditioned (noncontinuous) at the border of the search space.

For example the function values I am obtaining are:

 Finite Differencing:  9.62976
                                            First-order      Norm of
 Iter F-count            f(x)  Feasibility   optimality         step
    0     139    1.190988e+01    0.000e+00    1.334e-02
    1     278    1.190547e+01    0.000e+00    1.327e-02    7.001e-02
  ...
  111   15789    9.629760e+00    0.000e+00    2.130e+02    2.617e-08
  112   15933    9.629760e+00    0.000e+00    2.130e+02    1.309e-08
  113   16079    9.629760e+00    0.000e+00    2.130e+02    1.636e-09
 Optimization stopped because the relative changes in all elements of x are
 less than options.TolX = 1.000000e-10, and the relative maximum constraint
 violation, 0.000000e+00, is less than options.TolCon = 2.500000e-01.
 Optimization Metric                                         Options
 max(abs(delta_x./x)) =   7.82e-11                    TolX =   1e-10  (selected)
relative max(constraint violation) =   0.00e+00    TolCon =   3e-01 (selected)

and for the analytic gradient:

 Analytic Gradients:  10.15389
                                            First-order      Norm of
 Iter F-count            f(x)  Feasibility   optimality         step
    0       1    1.190988e+01    0.000e+00    1.331e-02
    1       2    1.190547e+01    0.000e+00    1.327e-02    6.679e-02
   ...
   28      98    1.015389e+01    0.000e+00    1.464e-02    4.946e-06
   29     108    1.015389e+01    0.000e+00    1.464e-02    1.082e-06
   30     119    1.015389e+01    0.000e+00    1.464e-02    1.183e-07
 Optimization stopped because the relative changes in all elements of x are
 less than options.TolX = 1.000000e-10, and the relative maximum constraint
 violation, 0.000000e+00, is less than options.TolCon = 2.500000e-01.
 Optimization Metric                                         Options
 max(abs(delta_x./x)) =   5.06e-11                    TolX =   1e-10  (selected)
 relative max(constraint violation) =   0.00e+00    TolCon =   3e-01 (selected)

I really don't expect an identical output, but at least something comparable, and in particular, I would expect the analytic gradient to yield better results. I also checked the gradient and neighborhood, where the analytic gradient optimization stops, there seems to be no problem such as non-continuous areas nearby. From my point of view, the algorithm simply does not expect the function value to be significantly decreased anymore.

EDIT:

Especially, if you compute the gradient where the optimization stops, you still have entrys, which significantly differ from zero

 [~,df] = f(G_opt);
 max(abs(df))
 ans =
    0.0147

Matt J 2013-10-13

编辑：Matt J 2013-10-13

the function only becomes badly conditioned (noncontinuous) at the border of the search space.

Not sure what you mean by the "search space". Even when constraints are specified, fmincon searches over all of R^n, which has no borders. Constraints are only satisfied asymptotically.

Anyway, we still need to see more information. How much do the G_opts differ in each case? Why should I consider 0.0147 a large final value for the gradient? Maybe it started at 1e7. Also, you said the finite differencing gave "amost perfect results", yet in what you show above it stopped with a much larger max gradient 2.130e+02.

Similarly, why should I consider the difference between 9.62976 and 10.15389 significant? Maybe the typical/starting value of the function was 1e20.

It would help to actually show your objective function and constraints, and the code you used to initialize and call fmincon.

Mathias 2013-10-13

编辑：Mathias 2013-10-13

在 MATLAB Online 中打开

Okay, yeah sorry, that was stupid not providing any knowledge about the problem. The functional f depending on N variables, G varies within the Bounds

G in [-Gmax,...,+Gmax]^N, Gmax = 25.0

The G_opt differes by something like 13.2403, which is more than a quarter of the full range of possible values given by Gmax. In addition, I added some lines of the iteration output in my first reply, to give some idea about the first iteration function values.

Ohh, and the optimization has only these bound constraints

 lb = -Gmax;
 ub = +Gmax;

No further constraints.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Matt J 2013-10-13

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/90090-fmincon-less-optimal-results-with-user-supplied-analytic-gradient#answer_99529

编辑：Matt J 2013-10-13

My best appraisal, without seeing the details of your objective function, is that you have unbounded (or very large) second derivatives at/near your solution. That's the only thing that explains to me why the algorithm starts to take very tiny Newton-like steps even in regions where the gradients are still pretty large. Note that this happens both with the finite differencing and analytical case. It doesn't appear that either version successfully finished the optimization.

8 个评论
显示 6更早的评论隐藏 6更早的评论

Mathias 2013-10-13

在 MATLAB Online 中打开

Hey, okay I just checked that by having a look at the Hessian output of fmincon: in fact there are some very big entries in the matrix, probably causing the tiny steps. However, I tested another approach on how to compute the Hessian

 options.Hessian     = 'lbfgs';
 % instead of
 % options.Hessian     = 'bfgs';

This brought a huge benefit to the optimization:

 Analytic Gradients:  9.61685
                                            First-order      Norm of
 Iter F-count            f(x)  Feasibility   optimality         step
    0       1    1.190988e+01    0.000e+00    1.331e-02
    1       2    1.190547e+01    0.000e+00    1.327e-02    6.679e-02
   ...
   43     139    9.616847e+00    0.000e+00    8.225e-03    9.500e-07
   44     148    9.616847e+00    0.000e+00    8.225e-03    4.156e-07
 Optimization stopped because the relative changes in all elements of x are
 less than options.TolX = 1.000000e-10, and the relative maximum constraint
 violation, 0.000000e+00, is less than options.TolCon = 2.500000e-01.
 Optimization Metric                                         Options
 max(abs(delta_x./x)) =   8.99e-11                    TolX =   1e-10  (selected)
relative max(constraint violation) =   0.00e+00    TolCon =   3e-01 (selected)

To me that looks as if the optimization finally stopped where it is actually supposed to stop. I was also considering deriving an analytic expression for the Hessian as well, but since the results already do look quite good, I guess I won't have to. From your point of view, does the optimality measure indicate, that this is a local optimum?

Thanks a lot for the helpful hint, you made my sunday! :-)

Mathias 2013-10-14

在 MATLAB Online 中打开

Hey,

concerning your comments:

I checked the 'fin-diff-grads' algorithm before but the benefit was not as good, in addition the computation becomes quite slow then
The TolCon was just chosen to be 1% of Gmax, but you're right, I might as well just use the default value instead

However, I got another slight improvement by setting the subproblem algorithm:

options.SubproblemAlgorithm = 'cg';

With that the final output becomes

 First-order      Norm of
 Iter F-count            f(x)  Feasibility   optimality         step
    0       1    1.190988e+01    0.000e+00    1.331e-02
    1       2    1.190547e+01    0.000e+00    1.327e-02    6.679e-02
   ...
   56     127    9.590544e+00    0.000e+00    8.244e-03    9.437e-06
   57     135    9.590544e+00    0.000e+00    8.244e-03    5.161e-07
 Optimization stopped because the relative changes in all elements of x are
 less than options.TolX = 1.000000e-20, and the relative maximum constraint
 violation, 0.000000e+00, is less than options.TolCon = 1.000000e-06.
 Optimization Metric                                         Options
 max(abs(delta_x./x)) =   5.68e-21                    TolX =   1e-20 (selected)
relative max(constraint violation) =   0.00e+00    TolCon =   1e-06 (selected)

I am not quite sure about that, maybe the optimality measure just has some problem with the fact that some variables move towards the edges of the non-continuous regions, as the local minimum is located there. If the Hessian is calculated at that point (numerically), the optimality measure may become relatively small or again the big entries in the Hessian make the optimizer stop due to tiny step sizes.

I am not sure, if the process of deriving the Hessian analytically is worth the effort, since the results already looks quite good to me.

Mathias 2013-10-15

Yeah, the objective function is unfortunately 1500 lines of code, it is a highly complicated and highly nonlinear thing... I will try to have a more detailed look on that tomorrow or the day after, maybe then I will be able to ask some more concrete questions, right now I am not really sure how to proceed. But thank you very much for the moment, you already brought some helpful insight into the problems, that I am probably having...

Matt J 2013-10-15

A mathematical description of the problem then, showing what you're trying to minimize and why/where the discontinuities occur.

请先登录，再进行评论。

fmincon: less optimal results with user supplied analytic gradient?

4 个评论
显示 2更早的评论隐藏 2更早的评论

回答（1 个）

8 个评论
显示 6更早的评论隐藏 6更早的评论

另请参阅

类别

标签

产品

Community Treasure Hunt

fmincon: less optimal results with user supplied analytic gradient?

4 个评论 显示 2更早的评论隐藏 2更早的评论

回答（1 个）

8 个评论 显示 6更早的评论隐藏 6更早的评论

另请参阅

类别

标签

产品

Community Treasure Hunt

4 个评论
显示 2更早的评论隐藏 2更早的评论

8 个评论
显示 6更早的评论隐藏 6更早的评论