Quadprog message: The problem is non-convex
18 次查看(过去 30 天)
显示 更早的评论
S.
2020-1-23
Hi,
First of all, I am using Matlab 2017b and the optimalization toolbox of 2019. I am using quadprog (not suprisingly ;)) to solve a QP problem. In reality, my problem is convex, however I get in the command window a message from quadprog that my problem is non-convex. I am not using lower and upper bounds, but inequality constraints. Furthermore, I checked if I had negative eigenvalues in my Hessian. This was indeed the case. I had one or two really small negative eigenvalues (-1e20). However, I am using quadprog for multiple timesteps, whereby the lineair term f (in my case F) is updated and therefore changes a little bit after every time step. If I am running the code for example for 10 timesteps, I get 8 times the message that the problem is non-convex and two times that the problem is convex and a minimum has been found. This is in my opinion quite weird, because the Hessian stays the same every timestep (and with that the one or two really small negative eigenvalues).
Does anyone have a potential solution to fix this issue or the cause of it? If you think you need more information, please let me know.
tic;
[X,fval,exitflag,output,lambda] = quadprog(G,F,Aineq_SparseCondensed,bineq_SparseCondesed,[],[],[],[],[],options);
toc
G = (G+G')/2;
Thanks in advance!
Kind regards
1 个评论
Walter Roberson
2020-3-9
You cannot use the 2019 toolboxes with 2017 MATLAB without expecting inexplicable errors.
采纳的回答
Matt J
2020-1-23
编辑:Matt J
2020-1-23
You cannot submit a problem that is borderline convex to quadprog if it is using an algorithm that expects convexity. If you do, quadprog can see the problem as convex or not-convex arbitrarily, depending on floating point numerical errors that essentially contribute random noise to the calculations that it is doing.
I would guess that your Hessian is supposed to be more distinctly positive definite, so you should review the way you set up the problem.
42 个评论
S.
2020-1-24
Thanks for your fast response.
The problem is not borderline convex, but completely convex in reality. If the size of the Hessian becomes bigger, then the problem starts to occur. Furthermore, if I use for example lower bound and upper bound instead of the inequality constraints the problem is convex, which is strange, because the function of both is almost the same. If I use both lb and ub, and inequality constraints, the solving time is extremely large and the solution wrong.
Do you maybe know a systematic approach I could perform to know what causes this problem (for example to know if numerical issues causes this problem) or settings in quadprog I could try ? Furthermore, how could I get my Hessian more positive definite by just changing the set up of my problem? Could you give an example?
Matt J
2020-1-24
编辑:Matt J
2020-1-24
The problem is not borderline convex, but completely convex in reality...Furthermore, if I use for example lower bound and upper bound instead of the inequality constraints the problem is convex
No, whether a quadratic problem is strongly convex (which you want, so that numerical behavior is good) is dictated entirely by the eigenvalues of your Hessian. Adding or removing bounds and other linear inequality constraints does not influence that.
However, the ability of quadprog to recognize that your problem is strongly convex can be impaired by floating point errors when your Hessian matrix is nearly singular. Moreoever, changing the constraints can conceivably affect the floating point errors that are encountered and what software flags they trigger. For example, under some particular constraints, the algorithm might try to step in a direction that is in the null space of your Hessian. Along such directions, the quadratic objective function should have zero curvature, but due to floating point errors, the curvature that gets calculated by the algorithm may be slightly negative and so it will raise a flag that the problem is non-convex.
Do you maybe know a systematic approach I could perform to know what causes this problem
In addition to checking the Hessian eigenvalues as you have done (they should all be strictly positive), you should check its reciprocal condition number rcond(H). If it is very close to zero, it means H is nearly singular and hence not strongly convex, for numerical purposes.
Furthermore, how could I get my Hessian more positive definite by just changing the set up of my problem? Could you give an example?
Often, a near singular Hessian arises because there aren't enough equations to determine the optimum uniquely, or if there are redundant variables. For example in a linear least squares problem,
the Hessian will be singular if A does not have full column rank. Poor choices of units can also cause this behavior. For example, this quadratic function
is theoretically strongly convex, but numerically, it has a very singular-looking Hessian:
>> H=[1e12, 0; 0 1]*2; rcond(H),
ans =
1.0000e-12
But if I just change the units of x via the change of variables , then the new objective
has a distinctly non-singular Hessian:
>> H=eye(2)*2; rcond(H)
ans =
1
S.
2020-2-11
编辑:S.
2020-2-11
Thanks for your response and the clarification. Sorry for my delayed response.
However, I have still some additional questions and will provide you with more information.
In the second answer, you mention that if the reciprocal condition number is very close to zero, it is nearly singular. Could you indicate what ''very close to zero'' is, because it is very subjective? Is there a boundary in which order of magnitude the reciprocal condition number should lay? For example, if the reciprocal condition number is smaller than 1e-10, then the Hessian is nearly singular. Furthermore, is looking at the condition number also useful? If yes, from which order of magnitude it can be said that the matrix is ill-conditioned.
The optimization problem, I am trying to solve:
with zi|k = H*xi|k, ui|k the predicted inputs, xi|k the predicted states.
In this optimal control problem Q and R are both symmetric positive definite matrices, with Q a diagonal matrix (all diagonal entries have the value 100) and R a diagonal matrix (all diagonal entries have the value 0.1). Furthermore, the mean of the vector of z_r is around 5 and the mean of the vector u_r is around 0.1. The predicted inputs are constrained to be between 0 and 10. Therefore, I think the units are fine.
Moreover, the future states could be written as a function of the future inputs and after some substitution the optimization problem looks like:
with xi|k = Z*ui|k + z_k_hat
with Z a banded null-space matrix (should keep a specific structure), Omega a diagonal matrix with transpose(H)*Q*H and R on the diagonal entries, C_1 a vector with -2*transpose(z_r)*Q*H and -2*transpose(u_r)*R, z_k_hat a matrix with [A; A^2; ....;A^N] and A represents the dynamics of the system.
The condition number of Z is 1e7.
Do you have any ideas/suggestions how I can reformulate this model? It is hard from me to know where to start and I think the units are fine.
Matt J
2020-2-11
编辑:Matt J
2020-2-11
What are the condition numbers of Ω and ? In ay case, with a condition number of 1e7, it would appear that Z has some essentially linearly dependent columns. One thing that could help is if you attach these matrices (Z, Q, Ω, H, R) in a .mat file, so that others can examine them.
S.
2020-2-16
The condition number are as follows:
For Omega: Inf
For transpose(Z)*Omega*Z: 7.4375e+19
For Z: 5.6311e+08
Z is too big (7.5 mB). How could I send this?
Q, Ω, H and R could be found in Optimization2.zip
Could you still give answer on the first question, please (about reciprocal condtion number and condtion number)?
Matt J
2020-2-16
If Z is too big, just attach its columnwise max and min
zmax=max(abs(Z),[],1);
zmin=min(abs(Z),[],1);
S.
2020-2-17
编辑:S.
2020-2-17
This question:
In the second answer, you mention that if the reciprocal condition number is very close to zero, it is nearly singular. Could you indicate what ''very close to zero'' is, because it is very subjective? Is there a boundary in which order of magnitude the reciprocal condition number should lay? For example, if the reciprocal condition number is smaller than 1e-10, then the Hessian is nearly singular. Furthermore, is looking at the condition number also useful? If yes, from which order of magnitude it can be said that the matrix is ill-conditioned.
I attached now also the zmax an zmin of the Z-matrix. So all the matrices are added.
If you or somebody else could take a look at the matrices and would remark something that could be improved or the cause of the problem, it would be really nice.
Matt J
2020-2-17
编辑:Matt J
2020-2-17
Is there a boundary in which order of magnitude the reciprocal condition number should lay?
You won't be able to indentify a sharp boundary that applies to your situation, but a condition number of less than 1000 would definitely be considered good and a condition number of Inf, like you have in your Omega matrix can definitely be considered a singular matrix. So, your objective function is clearly not strongly convex. Since Omega is a diagonal matrix, I would simply investigate why its maximum diagonal element is so much larger than its minimum diagonal element...
To get a quantitative perspective on the condition number, you should understand that it is the factor by which percent numerical errors get magnified when solving a set of linear equations involving the matrix,
So, if quadprog reaches a step where it must solve a set of linear equations with condition number K and there is percent error delta in the equation data, then the results will have percent error K*delta. Since you have a condition number of K=1e19, you will need percent error in the input data of delta=1e-19 % (which is beyond what double floating point precision can do) in order to have a 1% error in the solution.
S.
2020-3-8
Hi,
Thanks for your response.
I am also using the Gurobi Optimizer. Gurobi is using the barrier method and quadprog the interior-point method. However, quadprog can handle better ill-conditioned matrices then Gurobi. When quadprog gives me the correct results for small and middle size Hessians, Gurobi gives already bad results (and the warning that the range of the quadratic objective terms is large) for the same QP problem. Only for very large ill-conditioned Hessians quadprog also gives an error that the problem is non-convex. Do you have maybe a reason for this? That despite they use the same algorithm, quadprog can handle the same QP problem better?
Matt J
2020-3-8
No, I can't but I don't think it matters. One shouldn't try to solve ill-conditioned problems. One should make them well-conditioned.
S.
2020-3-8
I tried to make it well-conditioned by scaling the matrices, however it didn't succeed.
Is there more information to find about the interior-point method that quadprog specifically uses (so not general information, but specifically for quadprog)?
S.
2020-3-9
This is about non-lineair optimization and the fmincon function, not the barrier method?
Is there by the way a relationship between an ill-condtioned Hessian and the iterations to converge?
It seems namely due to the fact the problem is ill-conditioned the solver needs a lot of iteriations to find a global minimum?
Matt J
2020-3-9
编辑:Matt J
2020-3-9
This is about non-lineair optimization and the fmincon function, not the barrier method?
The link I gave you should take you specifically to the section on the interior-point method.
Is there by the way a relationship between an ill-condtioned Hessian and the iterations to converge? It seems namely due to the fact the problem is ill-conditioned the solver needs a lot of iteriations to find a global minimum?
That is certainly true for gradient descent methods (e.g. conjugate gradient, steepest descent). See, for example,
Some methods will automatically rescale the problem so that the condition number is small. For unconstrained Newton's method, for example, it always takes a single iteration to converge. This assumes, however, that the Hessian, while ill-conditioned, is still reasonably non-singular. As we've discussed, your problem is essentially singular to within double precision.
S.
2020-3-11
编辑:S.
2020-3-11
Does the interior-point method of quadprog make use of gradient descent methods?
And does quadprog generally use many iterations with low cost/time of each iteration, or few iterations with expensive cost of each iteration (lot of time)? I think the latter one based on my results, but I am not sure. If so, is there an explanation for?
Furthermore, I found the following on the website: ''The algorithm has different implementations for a sparse Hessian matrix H and for a dense matrix. Generally, the sparse implementation is faster on large, sparse problems, and the dense implementation is faster on dense or small problems.''
Why is this the case?
And maybe if you know, what is the difference between interior-point method and GPAD (Gradient Projection method)? I know for example the interior-point method needs the Hessian online to solve an MPC problem and with GPAD this step could be performed offline.
Thanks in advance.
Matt J
2020-3-11
编辑:Matt J
2020-3-11
And does quadprog generally use many iterations with low cost/time of each iteration, or few iterations with expensive cost of each iteration (lot of time)? I think the latter one based on my results, but I am not sure. If so, is there an explanation for?
If there is, I am sure that it is in the algorithm description that I gave you the link for. It should be possible to deduce computational expense of the different steps of the algorithm from that.
The algorithm has different implementations for a sparse Hessian matrix H and for a dense matrix. Generally, the sparse implementation is faster on large, sparse problems, and the dense implementation is faster on dense or small problems.'' Why is this the case?
Matrix arithmetic is faster for large matrices if they are sparse, and if you manipulate them in sparse form, as Matlab's sparse type does.
And maybe if you know, what is the difference between interior-point method and GPAD (Gradient Projection method)?
Well for one thing, GPAD is not confined to travel through the interior of the feasible set as the interior- point method is (hence its name).
S.
2020-3-11
If there is, I am sure that it is in the algorithm description that I gave you the link for. It should be possible to deduce computational expense of the different steps of the algorithm from that.
The problem is that the math is quite hard to understand for me. So I have no idea.
The algorithm has different implementations for a sparse Hessian matrix H and for a dense matrix. Generally, the sparse implementation is faster on large, sparse problems, and the dense implementation is faster on dense or small problems.'' Why is this the case?
Shouldn't it be dense,small problems in stead of dense or small problems?
Well for one thing, GPAD is not confined to travel through the interior of the feasible set as the interior- point method is (hence its name).
What is the advantage/disadvantage of this?
Matt J
2020-3-12
编辑:Matt J
2020-3-12
The problem is that the math is quite hard to understand for me. So I have no idea.
I gave you the wrong link earlier. It is now corrected. In any case, you will see in the description that there are various steps within each iteration requiring the solutions of linear equations. Those can be computationally burdensome, depending on the size and sparisty of your Hessian and constraints.
Shouldn't it be dense,small problems in stead of dense or small problems?
I don't think so. If your Hessian is large and dense, it would not be a good idea to use a sparse matrix data-type to represent it. You would want to use a dense type.
What is the advantage/disadvantage of this?
Roughly speaking, it lets you decompose the optimization into a series of unconstrained minimizations, which are easier than constrained ones. The solutions of those optimization problems lie in the interior of the feasible set, so you can find them just by setting gradients to zero.
S.
2020-3-18
What is the reason why it is chosen that quadprog can only solve convex problems and not non-convex problems? Is there another function that can solve non-convex QP problems?
What gives the best results? Using lower bound and upper bound or incorporating these bounds in equality constraints? What are the differences? And advantages and disadvantages of each method?
Matt J
2020-3-18
编辑:Matt J
2020-3-18
What is the reason why it is chosen that quadprog can only solve convex problems and not non-convex problems? Is there another function that can solve non-convex QP problems?
It is only quadprog's interior-point algorithm that requires convexity. The trust-region-refelective method doesn't require convexity, but it does have certain restrictions on the kind of constraints you can have (only equality constraints or bounds but not both).
You can always use fmincon. However, the ill-conditioning of your problem can have undesirable effects no matter what solver you use. It generally slows convergence, for one thing. It also makes the result sensitive to floating point errors. Effectively, you have a non-unique set of optimal points when your problem is ill-conditioned. Which solution you get can vary greatly from computer to computer depending on its CPU architecture and the way it does floating point arithmetic.
It is still not clear to me why you were able to get nowhere with the investigation of your Omega matrix and why cond(Omega)=Inf. This has to mean that the diagonal elements are very different in scale, and might be expressed in a poor choice of units. It should be very easy to determine if this is the case.
What gives the best results? Using lower bound and upper bound or incorporating these bounds in equality constraints? What are the differences? And advantages and disadvantages of each method?
Bounds are simpler for many solvers to deal with than more general inequalities. If you lump the bounds in with the other linear inequalities, the solver does not know that some of your constraints are simpler than others and won't take advantage of that.
S.
2020-3-18
In my case, no extra inequality constraits are present. Just the lower and upper bound which are written as inequality constraints, so each constraint is as simple/complex. Does the solver then still not see that each inequality constraint is just an upper or lower bound and therefore are as simple/complex?
Is a possible advantage of using lower and upper bound also that a solution may be found faster, because the space in which the algorithm can search is predefined and for inequality constraints the space is not directly restriced?
Furthermore, an advantage of inequality constraints is that it is possible to have a lower/upper bound on a linear combination of the decision variables, which is not possible with lower and upper bound.
Matt J
2020-3-18
编辑:Matt J
2020-3-18
Just the lower and upper bound which are written as inequality constraints, so each constraint is as simple/complex. Does the solver then still not see that each inequality constraint is just an upper or lower bound and therefore are as simple/complex?
Not as far as I'm aware. It could probably deduce that from a pre-analysis of your Aineq matrix, but I don't think it would bother. It has no reason to expect you to express bounds through Aineq,bineq when you have the option of doing so more directly through lb,ub.
Is a possible advantage of using lower and upper bound also that a solution may be found faster, because the space in which the algorithm can search is predefined and for inequality constraints the space is not directly restriced?
For some algorithms, yes. Some of the legacy algorithms in the toolbox, however, do restrict their search to the polyhedron specified by Aineq*x<=bineq (the active-set methods do this, I think). It is, of course, harder to do that for general linear inequalities than for simple bounds.
Furthermore, an advantage of inequality constraints is that it is possible to have a lower/upper bound on a linear combination of the decision variables, which is not possible with lower and upper bound.
That's not a reason, though, to lump your bounds into Aineq, bineq. You can use lb,ub to specify the bounds, but still use Aineq,bineq for the more complicated constraints that can't be represented through lb,ub.
S.
2020-3-18
编辑:S.
2020-3-18
For some algorithms, yes. Some of the legacy algorithms in the toolbox, however, do restrict their search to the polyhedron specified by Aineq*x<=bineq (the active-set methods do this, I think). It is, of course, harder to do that for general linear inequalities than for simple bounds
What is exaclty polyhedron? I read a lot about is, but the interpretation is not really clear to me. It is also for example being discussed about polyhedral input and state constraints. Are input and state constraints always polyhedral? What does polyhedral exactly mean in that context?
Does the interior-point-convex algorithm also belong to the ''some algorithms,yes''?
So, what you are saying is that the specification of the polyhedron is harder/costs more time for linear equality constraints than for bounds? And therefore my sentence: ''because the space in which the algorithm can search is predefined and for inequality constraints the space is not directly restriced'' is not correct. It is just the shape of the polyhedron that is restricted faster by the algorithm for bounds than for linear inequality constraints.
That's not a reason, though, to lump your bounds into Aineq, bineq. You can use lb,ub to specify the bounds, but still use Aineq,bineq for the more complicated constraints that can't be represented through lb,ub.
But is is still an kind of advantage with respect to bounds, because with bounds constraints on linear combinations cannot be specified.
Matt J
2020-3-18
编辑:Matt J
2020-3-18
What is exaclty polyhedron? I read a lot about is, but the interpretation is not really clear to me. It is also for example being discussed about polyhedral input and state constraints.
I was abusing terminology somewhat. A polyhedron is a 3D shape with flat sides. A convex polyhedron can always be described by a set of linear inequalities Aineq*x<=bineq. An N-dimensional shape described by a set of linear inequalities Aineq*x<=bineq is called a convex polytope.It is a generalization of a polyhedron from 3D to N-D.
Does the interior-point-convex algorithm also belong to the ''some algorithms,yes''?
I believe so. I haven't seen documentation to confirm it, but an interior-point algorithm by definition is one that tries to confine its search to the interior of the feasible set.
So, what you are saying is that the specification of the polyhedron is harder/costs more time for linear equality constraints than for bounds?
There are various advantages to constraints consisting of simple bounds that different algorithms may take advantage of in different ways. For one thing, it is easier to find an interior-point in a region specified by bounds. This is of course of high importance to an "interior-point" algorithm.
Incidentally, I was mistaken earlier when I said that the interior-point-convex algorithm will not pre-analyze your linear inequalities Aineq*x<=bineq to see if any of them are pure bounds. I found a section of the documentation that says otherwise:
"The algorithm first tries to simplify the problem by removing redundancies and simplifying constraints. The tasks performed during the presolve step can include the following:...Check if any linear inequality constraint involves only one variable. If so, check for feasibility, and then change the linear constraint to a bound. "
But as you can see, the fact that it does this step confirms that quadprog does give bound constraints special handling.
S.
2020-3-20
So, if there are polyhedron input and state constraints, the constraints describe a space ''the polyhedron'' where the algorithm should search for a solution.
And if you're Hessian is ill-conditioned and converges therefore slowly, the ''interior-point'' algorithm cannot find a solution in this polyhedron and the algorithm jumps around in the solution space. Why does it then jump around? Due to that the solution does not change every step the interior point makes in the space?
And what is the reason that a Hessian should be positive definite?
Matt J
2020-3-20
编辑:Matt J
2020-3-20
All good questions, but not really on-topic with this thread. If you think your original question about why your problem is seen as non-convex by Matlab has been answered, I think you should Accept-click the Answer and start a new thread focused on your new question(s).
S.
2020-3-23
That's true. My questions were going a bit off-track. I will open a new thread. You can find it here:
One more question which is relevant regarding quadprog. It seems that quadprog, the (conventional) interior-point algorithm, is not exploiting the sparsity and structure of the sparse QP formulation. Do you have a reason for that?
In MPC, the computational complexity should scale linerarly with the prediction horizon N. However, results show that the complexity scales quadratically with the prediction horizon N.
S.
2020-3-23
编辑:S.
2020-3-23
Yes, I did. For example if the Hessian is: H= 2*Z*Omega*Z, then Z and Omega were constructed as sparse(Omega) and sparse(Z).
However, another point what I realize now, for the dense matrices, I gave the matrices in sparse format. Is that a big deal, that I did not pass them as full()?
S.
2020-3-23
Ok, but do you know why the (conventional) interior-point algorithm of quadprog, is not exploiting the sparsity and structure of the sparse QP formulation. Or possible reasons?
Matt J
2020-3-23
Ok, but do you know why the (conventional) interior-point algorithm of quadprog, is not exploiting the sparsity and structure of the sparse QP formulation.
I can't be certain that it isn't. I have not run your code myself.
S.
2020-3-23
Well, if I look at my plot, the computation time scales quadratically with N instead of linearly with N. So, it is not exploiting it fully, but just partly. If the algorithm would not exploit the sparse structure at all the computation time would scale cubically with N, i.e. the algorithm requires O(N^3(number of inputs + number of states)^3) operations.
If it is not exploiting it fully, is there maybe some possible explanation which you can think of? It does not have to be necessarily the case, but a possible explanation would already help me further.
Matt J
2020-3-23
编辑:Matt J
2020-3-24
Well, the details of your test are invisible to me, so any guess is going to be very arbitrary. However, one issue might be in the way that you are calculating computation time. Are you calculating the time per iteration, or is it the total time to a solution? It sounds like the computation time per iteration is expected to be O(N), but if the number of required iterations is also O(N), it would make sense that you observe O(N^2) total computation time.
S.
2020-3-23
I'm calculating the computation time as the time required to solve one QP (and this for multiple time steps). So the computation time is not the time of solving one QP divided by the number of iterations to come to that solution (and that for multiple time steps).
I said it wrong. Quadprog is exploiting the sparse not at all as it scales cubically with N (see attached figure). I also added the scripts. Optimalisatieprobleem2.m is the main script.
Hopefully you can give now a better guess.
Matt J
2020-3-23
So the computation time is not the time of solving one QP divided by the number of iterations to come to that solution
You should measure that first.
Matt J
2020-3-23
By putting tic..toc around the call to quadprog and dividing by the number of iterations
Matt J
2020-3-23
The "output" argument from quadprog will contain the number of iterations used, as well as other info.
[x,fval,exitflag,output] = quadprog(___)
I will not have time to look at the scripts anytime soon, but it is irrelevant for now. The computation time per iteration is the first thing you should check.
S.
2020-3-23
Oke, then I know now how to do that.
So I have to check whether the computation per iteration scales with O(N) in stead the computation time of one QP scales with O(N). So only the number of required iterations could also scale with O(N)?
Furthermore, the very last question :), if another solver, gets the same optimized values x, but the fval is different. How is that possible? For example quadprog and Gurobi (another solver) are given me the same x and fval, but GPAD gives me the same x, but an fval, which is a factor 10 bigger compared to quadprog and Gurobi.
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Quadratic Programming and Cone Programming 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!发生错误
由于页面发生更改,无法完成操作。请重新加载页面以查看其更新后的状态。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
亚太
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)