transform linear inequality constraints into bound constraints for optimization solvers

Suppose there are linear inequality constraints and are the parameters optimized by fmincon, lsqnonlin, or friends.
Since linear inequality constraints can generally not be enforced at intermediate iterations, I want to transform them into bound constraints which can be enforced at all iterations.
General idea:
If A is dense and has m rows (i.e, there are m constraints), then we can make a "change of variables"
, where are the rows of A, for instance, .
This leads to simple bound constraints . Does that make sense?
Specific problem:
My parameters y represent interpolation values corresponding to interpolation points x and the resulting interpolation function is supposed to be convex on the interpolation domain. For the specific case of linear interpolation, @Matt J showed that convexity is realized by increasing second order finite differences
(which represents a set of linear inequality constraints)
This can be transformed into simpler bounds on the new parameters by the change of variables
with
What I am really working with are smoother B-spline basis functions of higher degree using spapi function. @Bruno Luong showed that the linear inequality matrix A is given by
% k: spline order, x: points where values y live
B = spapi(k,x,eye(length(x)));
Bdd = fnder(B,2);
A = Bdd.coefs'
and
is a sufficient condition for convexity.
I am wondering whether a change of variables (like for linear interpolation functions) can be applied here to transform the inequalities into bounds?
Any opinions are greatly appreciated!

1 个评论

For example, it is possible to convert a set of linear inequality constraints into a bound constrained problem by the introduction of what are called slack variables.
That is, if you have an inequality
A*x >= b
then you CAN write it as an equality
A*x == b + y
where y is a new (slack) variable that is constrained to be non-negative. If you have multiple constraints, then you would introduce one slack variable for each inequality constraint.
Such slack variables are a common artifice in mathematics, used for example to solve the minimum sum of absolute values problem (versus the traditional linear least squares) but also used to formulate optimization problems using Lagrange multipliers. You can also view the question of whether a point lies inside a convex polytope, in terms of slack variables, one for each facet of the polytope. I can think of at least one other example.
However, you are asking to solve the problem in a different way, by use of a transformation. For example, given a vector x, which must form a monotonic increasing sequence, then we can transform the problem such that y=diff(x)>=0. And this is not always so easy.

请先登录,再进行评论。

 采纳的回答

No you cannot. It is well known that any generic linear constraints
Aeq * x = beq
A * x <= b
lo <= x <= up
is equivalent to "standard form" and
D * y = e
y >= 0
After some linear transformation of x to y.
In general simple box constraints is NOT equivalent to the generic constraints (in two forms expressed above).
It is not a proof but geometrically box constraints are linear polytope with parallel faces, and the other two are generally generic polytopes.
You cannot simple do variable change for instant if the number of inequality constraints is larger than the number of decision parameters.
In any case solver such as linprog, quadprog, lsqlin do such transformation behind the scene for us, and user should not worry about doing that before hand.
Here is a video to explain the standard form and the conversion some of them using slack variables https://www.youtube.com/watch?v=db979cMaQ0M

14 个评论

I see. My motivation for doing this is because, currently, I scale the matrix A by a large number c such that fmincon pays "enough" attention to the linear constraints. Whether this is a good idea, is a different question.
You cannot simple do variable change for instant if the number of inequality constraints is larger than the number of decision parameters.
If we have n=numel(y) parameters (interpolation values), then we have n-2 inequality constraints, size(A,1) = n-2, to implement convexity of the spline.
However, if I understand correctly, you don't see any option to eliminate the inequality constraints and only work with simpler bounds, or some other clever transformation?
You can, just select a basis of the second derivative, for instant Bernstein spline basis with appropriate order. Take the second anti-derivative of this basis (call twice fnint) then add a global linear function (2 dimensional space). That form s the basis of the splines.
Then you can do whatever fitting with your data using this basis and requires the constraints that the (n-2) coefficients of the decomposition to be positive, and free (2) coefficients for the linear part. This is more or less what BSFK does when I show you few post back, IIRC.
Sounds like the approach I sketched in the question inspired from Matt.
Can you show these steps in code please and how to reconstruct the original params from the new coefficients?
% Convex Data
n = 5;
x = 1:n;
y = exp(x); % x.^3;
k = 4; % Order: cubic spline
fref = spapi(k, x, y); % Reference spline
tref = aptknt(x, k); % Reference knots
% Knots for the second derivative basis
tdd = tref(3:end-2);
% New function basis
N = cell(1, n-2);
% Generate basis functions
alphas = eye(n-2);
for i = 1:n-2
% Create spline using ith unit vector
Ndd = spmak(tdd, alphas(:, i)');
% Integrate twice to get function basis
N{i} = fnint(fnint(Ndd));
end
% Collocation matrix
B = zeros(n, n);
fnvals = zeros(n-2, n);
for j = 1:n-2
fnvals(j, :) = fnval(N{j}, x);
end
B(:, 1:n-2) = fnvals';
B(:, n-1) = x(:); % linear term
B(:, n) = 1; % constant term
% Solve for coefficients of basis
coef = B \ y(:);
% ############ Plot data check #################
xq = linspace(min(x), max(x), 100); % Query points
yq = zeros(1, numel(xq)); % Preallocate output
% First contribution
for j = 1:n-2
yq = yq + fnval(N{j}, xq) * coef(j);
end
yq = yq + coef(n-1) * xq + coef(n); % Add linear and constant terms
f1 = plot(xq, fnval(fref, xq)); hold on;
f2 = plot(xq, yq);
f3 = plot(xq, yq - fnval(fref, xq));
legend([f1, f2, f3], 'spapi - reference bases', 'new bases', 'error');
I implemented your suggestion, however, with a basis produced by spmak. That was my only idea to produce a spline without using your BSFK hammer and coding deBoor myself. Is it always valid what I did and is the chosen bases appropriate?
I add noise and enforce convexity to show the effect
% Convex Data
n = 50;
x = linspace(1,5,n);
y = exp(x); % x.^3;
sigma = 4;
y = y + sigma*randn(size(y));
k = 4; % Order: cubic spline
fref = spapi(k, x, y); % Reference spline
tref = aptknt(x, k); % Reference knots
% Knots for the second derivative basis
tdd = tref(3:end-2);
% New function basis
N = cell(1, n-2);
% Generate basis functions
alphas = eye(n-2);
for i = 1:n-2
% Create spline using ith unit vector
Ndd = spmak(tdd, alphas(:, i)');
% Integrate twice to get function basis
N{i} = fnint(fnint(Ndd));
end
% Collocation matrix
B = zeros(n, n);
fnvals = zeros(n-2, n);
for j = 1:n-2
fnvals(j, :) = fnval(N{j}, x);
end
B(:, 1:n-2) = fnvals';
B(:, n-1) = x(:); % linear term
B(:, n) = 1; % constant term
% lower bounds to enforce positive second derivative
lb = [zeros(1,n-2), -Inf(1,2)];
% Solve for coefficients of basis with constraints
coef = lsqlin(B, y(:), [],[],[],[],lb,[]);
Minimum found that satisfies the constraints. Optimization completed because the objective function is non-decreasing in feasible directions, to within the value of the optimality tolerance, and constraints are satisfied to within the value of the constraint tolerance.
% ############ Plot data check #################
xq = linspace(min(x), max(x), 400); % Query points
yq = zeros(1, numel(xq)); % Preallocate output
% First contribution
for j = 1:n-2
yq = yq + fnval(N{j}, xq) * coef(j);
end
yq = yq + coef(n-1) * xq + coef(n); % Add linear and constant terms
f1 = plot(xq, fnval(fref, xq)); hold on;
f2 = plot(xq, yq);
f3 = plot(xq, yq - fnval(fref, xq));
legend([f1, f2, f3], 'spapi - reference bases', 'new bases', 'error');
The idea is, of course, to create a data set (x,y) sampled from a noise-free function like x^2. This y represents the start vector (re-parameterized in the new basis) of the optimization. Since lower bounds can be enforced at all iterations, I should never end up with a y-vector which has significant noise in it.
Maybe I miss something here and I did not understand what you want to show with your example. I think what you show is function approximation, but I always resort to interpolation.
I only show the hypothetic constraints second derivative in original basis
A*c1 >= 0
can be converted to
c2 >= 0
in the second basis, as I understood, c1 and c2 are coefficients in respectively two basis spline functions.
If you don't have to enforce A*c1 >= 0 then indeed there is nothing change in the result, since both are interpolated with the same span space, but two different basis.
The math looks consisttent to me.
Thanks, last question: If we want to fix s(x = x(1)) = 0, can this constraint be implemented in the lb/ub or do we have to introduce a single equality constraint in the new basis c2? In the original basis c1, this was easy to do by lb(1) = ub(1) = 0.
If you shift the x vector to
xx = x - x(1)
Then the n-2 basis function all has value B(x(1)) = 0, you can remove then the constant function from the basis, or alternatively force the corresponding lower/upper bound to be 0 during the least square fit.
I mostly work with cubic splines and currently also implemented the constraint s'''(x(end)) >=0 in addition to s''(x) >=0 on [x(1), x(end)] to ensure convexity beyond x(end).
Is there a way to express the third derivative constraint at x(end) in terms of the new basis (as a linear inequality constraint)? Or, even better, is there another basis that allows both constraints to be expressed by bounds only?
Is there a way to express the third derivative constraint at x(end) in terms of the new basis (as a linear inequality constraint)?
Uh, just compute the third derivative at x(end) of the basis, it give you the innequality constraints on coefficients (the third derivative at any point is a linear operator).
Or, even better, is there another basis that allows both constraints to be expressed by bounds only?
No you have already built the new basis in order to constraint the second derivative, there is nothing left (the constant and linear term are free but they do not have third derivative). As I said at one of the earlier post you have the chance of chose appropriate basis only when the number of inequalities is smaller or equal than the dimension of the problem. If you add more constraints, then they cannot me converted to bounded variable constraints.
The slack variable trick work well for LP objective function, but it rarely helpful for other objective function. For example the Weyl-Minkowski representation showed by Matt can be viewed as slack variables (lambda + mu), but they do not associates with a change of basis functions (a bijective transformation is change of basis), you need to use Weyl-Minkowski formula to compute point x in the admissible polyhedra, then compute the fit objective from x. That is OK but the number of slack variables should be reasonable to make the problem managable on computer.
Overall It looks like the question of transforming to bounds is a steril method to me, just use directly linear inequality constraints within the numerical solver rather than do all the non trivial transformation to convert to bound constraints. It migh shed a light theorically but has no numerically advantage IMO. But I'm happy to be prove to be wrong on this view.
No you have already built the new basis in order to constraint the second derivative, there is nothing left (the constant and linear term are free but they do not have third derivative). As I said at one of the earlier post you have the chance of chose appropriate basis only when the number of inequalities is smaller or equal than the dimension of the problem. If you add more constraints, then they cannot me converted to bounded variable constraints.
Makes totally sense.
The slack variable trick work well for LP objective function, but it rarely helpful for other objective function. For example the Weyl-Minkowski representation showed by Matt can be viewed as slack variables (lambda + mu), but they do not associates with a change of basis functions (a bijective transformation is change of basis), you need to use Weyl-Minkowski formula to compute point x in the admissible polyhedra, then compute the fit objective from x. That is OK but the number of slack variables should be reasonable to make the problem managable on computer.
I agree that this new basis makes only sense if we can get rid of all inequality constraints. The single constraint on the thid derivative is the only additional constraint I can think of for my problems. Given that, do you think a combined scheme (new basis for second deriv + Weyl-Minkowski representation for third deriv) is managable? We only have a halfspace for the constraints. Maybe this does not even work for a single constraint (but A must be a quadratric matrix instead) and you can quickly sort this idea out and I am happy with what we discussed here.

I can't see how Weyl-Minkowski easily combine with other ideas.
The brute force of enumerate all combonations (vertices and extrem rays) rarely work when implementing on computer. Its value is on theoretcal side.

请先登录,再进行评论。

更多回答(1 个)

Yes, a set A*y>=0 is a cone and so can equivalently be expressed as where are the so-called extreme rays of the cone and are non-negative coefficients. So, in theory, you could re-express the problem with as the unknowns and which require only non-negativity constraints. However, finding the extreme rays given the matrix A can be computationally demanding, depending on the dimension of the problem.

9 个评论

The number of unknowns is small, not more than 50.
Could you show how to determine the extreme rays r_j? Are they fully determined by A?
For a more general polyhedron A*y>=b, the Weyl-Minkowski theorem says that any y in this set can be expressed in the form,
Unfortunately, this requires an equality constraint on the and so is not as neat a case.
I always have b=0 (A >=0) and as I said very small number of params. Maybe this info helps to detail your answer on how to get the r_j?
The number of unknowns is small, not more than 50.
That's probably going to be way too large. For N unknowns, there can be O(2^N) extreme rays. I think you're barking up the wrong tree.
One can remove the condition lambdai <= 1 in the Weyl-Minkowski theorem.
So it becomes also a standard form with single equality constraint.
The trade off is one must enumerate all the vertices v_i of the polytopes and this strategy absolutely out of the question numerically in problem with non-toy dimension superior to 4, 5 may be.
Yes, but there are no v_i and no equality constraints in this case, because the constraint set A*y>=0, y>=0 is just a cone. Still, I think that enumerating the r_j is also going to be exponentially hard.
Just out of curiosity and if it is not too much effort, would you show for a small n the procedure to determine the r_j?
For a small example, you can look here:
There is also 3rd party code for higher dimensional cases. The following is one offering, though I've never used it myself:
In the theoreme In page 23 of this document the extreme directions (rays directions) can be seen as vertices of an almost similar form of the original polyhedra.
To find a vertices of an polyhedra you need to pick a lot of combination of vectors (50) within a large number of vectors (96 = 2*48). This brute force would be impossible to achieve.

请先登录,再进行评论。

类别

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by