How do I set up fmincon correctly for parallel computation? "Supplied objective function must return a scalar value"

I have an expensive objective function which takes in multiple parameters, while the contraints are constant. It looks something like this:
out1, out2 = obj_func(in1, in2)
in1 has 1000 different cases. For each possible value of in1, I want to minimize out1 by adjusting in2. At the same time, out2 has to be < 1 in all cases. Because obj_func takes a while to run, I want to parallelize the computation, either by calculating several gradients at the same time, or solve the problem for several in1.
Below is my simplified code:
nPoints = length(INVARS(:,1));
odInputNames = {'DISA'; 'ALT'; 'MACH'};
% Number of batches depending on given batch size:
sizeBatch = 20;
nBatch = floor( nPoints/sizeBatch ) + 1; % Last batch contains remaining points
% Define constraints
CONSTR = struct(...
'param_A', '...',...
'param_B', '...',...
'param_C', '...',...
CONSTR_REQ = [ 1800; ...
1.10; ...
1.15; ...
];
for i = 1:nBatch
% Create batchwise input vectors
for iODVar = 1:length(odInputNames)
if i < nBatch
OD.(odInputNames{iODVar}) = INVARS(sizeBatch*(i-1)+1:sizeBatch*i, iODVar);
else
OD.(odInputNames{iODVar}) = INVARS(sizeBatch*(i-1)+1:end, iODVar); % Last batch has fewer points
end
end
% Setup of FMINCON Optimizer
% ---------------------------------------------------------------------
A = [];
b = [];
Aeq = [];
beq = [];
lb = ones(size(OD.DISA)) * 80;
ub = ones(size(OD.DISA)) * 120;
options = optimoptions(@fmincon, ...
'Algorithm', 'active-set',...
'MaxIter', 200,...
'ConstraintTolerance', 1e-8,...
'OptimalityTolerance', 1e-8,...
'Display', 'off',...
'UseParallel', true);
x0 = ones(size(OD.DISA)) * 85;
% Declare constraint results to be shared between obj_fun and con_fun
constr_fields = fieldnames(CONSTR);
con_res = zeros(length(constr_fields), length(OD.DISA));
% Execute Optimizer
[PLA_max, obj_min,exitflag,~,~] = fmincon( @(x)obj_fun(x), x0, A, b, Aeq, beq, lb, ub, @(x)con_fun(), options );
% Objective:
function [obj] = obj_fun( PLA )
OD.PLA = PLA;
[OD.OUT, OD.IN] = expensive_function( DES, OD );
obj = - OD.OUT.param_obj;
for i=1:length(constr_fields)
con_res(i,:) = eval(['OD.' CONSTR.(constr_fields{i})]);
end
end
% Constraints:
function [c,ceq] = con_fun()
exp_CONSTR_REQ = repmat(CONSTR_REQ, 1, length(OD.DISA));
c = (con_res - exp_CONSTR_REQ); % Non-Equality constraint ( !<= 0 )
ceq = [];
end
It works fine when I solve one case at a time, but when I try to parallelize it I got this error. How can I fix it?
Error using fmincon (line 348)
Supplied objective function must return a scalar value.

2 个评论

Why is lb > ub ?
lb = ones(size(OD.DISA)) * 120;
ub = ones(size(OD.DISA)) * 80;
Why does the error message report something about fminunc ? It is not referenced in your code.
Sorry, made a mistake while simplifying the code. Both mistakes have been fixed in the post now.

请先登录,再进行评论。

 采纳的回答

I want to parallelize the computation, either by calculating several gradients at the same time, or solve the problem for several in1.
The UseParallel option, which you are already invoking, does parallelize the gradient computation. If there is an analytical algorithm for computing the gradient, however, that will often be quicker.
The error message about fmincon returning a scalar should be easy for you to investigate just by calling the objective function on the initial point x0. If obj_fun(x0) gives you a non-scalar value, then you have a bug, and you must trace it.
If you are deliberately having it return a non-scalar, then you have a misconception about how fmincon works. fmincon cannot perform multiple optimizations in a single call.

25 个评论

I want to parallelize the optimization process, and not just calculating several gradients, but different setups at the same time because expensive_function here is capable of receiving multiple sets of inputs and return independently calculated outputs with minimal time penalty. I was trying to use this feature to solve multiple optimizations simultaneously.
I do understand now that fmincon cannot do that, so I am experimenting with parfor or something like that to call multiple fmincon instead, but so far without success.
Pose the optimization problem as the sum of the objective functions that you want to mimize
That worked! Sort of...
So now fmincon is able to iterate multiple sets of inputs at the same time and attempts to minimize the sum of the resulting objective parameters. However for some reason x here is stuck at the initial value and fmincon just keeps calling the objective function with the same inputs. For each set of input the x would be different, so I can't simply bundle them up like with obj here.
My objective function now looks like this:
function [obj] = obj_fun( PLA )
OD.PLA = PLA; % PLA is an array
[OD.OUT, OD.IN] = expensive_function( DES, OD );
obj = - sum(OD.OUT.param_obj);
for i=1:length(constr_fields)
con_res(i,:) = eval(['OD.' CONSTR.(constr_fields{i})]);
end
end
I thought it has to do with the non-linear constraints, but even without them fmincon is still stuck. How can I fix it?
Why do you return
obj = - sum(OD.OUT.param_obj);
and not
obj = sum(OD.OUT.param_obj);
?
I am actually maximizing param_obj there, hence the negative sign.
OD.PLA = PLA; % PLA is an array
Shouldn't DES be an array as well? In my suggestion, I indicated that there should be a different DES for every term in the objective function,
Are you saying you can't reproduce your old results even when all the DES are set equal to each other des_i = des_j ?
DES here is simply a struct loaded from a .mat file, which is necessary for expensive_function. It should not have been an input argument because it's the same regardless of OD. Sorry for the confusion. I've moved it inside the function and now expensive_function requires only OD, which is a struct with parameters as arrays. For example
OD.in1 = [0, 1, 2, 3]
OD.in2 = [10, 9, 8, 7]
OD.PLA = [x1, x2, x3, x4] % At the start they're all set to x0
expensive_function then calculates each column of combinations of OD, for example [in1, in2] = [0, 10], which produces say an param_obj of 99, so the output would look like this
[OD.OUT, OD.IN] = expensive_function( OD );
obj = - sum(OD.OUT.param_obj); % OD.OUT.param_obj = [99, 15, 66, -2]
I want to find the x1, x2, x3, and x4 that result in the maximum OD.OUT.param_obj.
Unfortunately even with DES out of way, x is still not being adjusted. So from the output message of expensive_function, I can see that all x values are stuck at x0 and the iteration isn't going anywhere.
I could only get it to work when all arrays here were single doubles instead. Then expensive_function was called several times with different OD.PLA until OD.OUT.param_obj (also a double) reaches maximum.
I could only get it to work when all arrays here were single doubles instead.
I still don't know what this means. Originally your x1 was an array of size(OD.DISA). Unless you are saying that OD.DISA was a scalar, then PLA was never a "single double".
Also, it is not clear what was working before compared to now. With your original objective function f(x), what values of in1,in2 gave you a reasonable optimization result? And what happens with your new objective function F(x1,x2,x3,x4) when you set all four pairs of in1,in2 to the same pair that worked for the optimization of f()?
And have you verified that F(x1,x2,x3,x4) returns the same value as f(x1) + f(x2) + f(x3) + f(x4) up to floating point round-offs?That is a basic validation step that should be done before any optimization is attempted.
Oh, the code in my post was already the vectorized version. To clarify again, the code below worked, just took very long. It found the PLA that resulted in the minimal obj, before moving on to the the next set of OD.in.
in1 = [0, 1, 2]; % This is read from a file
in2 = [10, 9, 8] % This is read from a file
x0 = 75;
for i = 1:3
OD.in1 = in1(i); % OD.in1 is a scalar
OD.in2 = in2(i);
[PLA_max, obj_min,exitflag,~,~] = fmincon( @(x)obj_fun(x), x0, A, b, Aeq, beq, lb, ub, @(x)con_fun(), options );
end
function [obj] = obj_fun( PLA )
OD.PLA = PLA; % PLA is a scalar
[OD.OUT, OD.IN] = expensive_function(OD);
obj = - OD.OUT.param_obj; % OD.OUT.param_obj is a double
end
While this vectorized version would just keep iterating the same PLA.
in1 = [0, 1, 2]; % This is read from a file
in2 = [10, 9, 8] % This is read from a file
x0 = [75, 75, 75];
OD.in1 = in1; % OD.in1 is an array
OD.in2 = in2;
[PLA_max, obj_min,exitflag,~,~] = fmincon( @(x)obj_fun(x), x0, A, b, Aeq, beq, lb, ub, @(x)con_fun(), options );
function [obj] = obj_fun( PLA )
OD.PLA = PLA; % PLA is an array
[OD.OUT, OD.IN] = expensive_function(OD);
obj = - sum(OD.OUT.param_obj);
% OD.OUT.param_obj is an array of doubles that each correspond to each
% combination of [in1(1), in2(1)], [in1(2), in2(2)] and so on
end
The values of in1 and in2 are read from a data file, so they're always the same. The difference is only if I load it into OD.in1 one by one in a for-loop, or 20 at the same time in batches.
In terms of F(x1,x2,x3,x4) == f(x1) + f(x2) + f(x3) + f(x4). There is a relative difference to the scale of 10e-8. I suspect the reason is that there is also an iteration scheme in expensive_function, and when you supply an array of in1 instead of a scaler, it also bundles the overall error up in some way. It is a complex legacy code that I have not much control over, but I can assure you f(x) is built for vectorized inputs and has been used and tested for many years.
What happens if you use
function [obj] = obj_fun( PLA )
in1 = OD.in1;
in2 = OD.in2;
tmp = zeros(size(PLA));
for i = 1:numel(PLA)
OD.PLA = PLA(i);
OD.in1 = in1(i);
OD.in2 = in2(i);
[OD.OUT, OD.IN] = expensive_function(OD);
tmp(i) = OD.OUT.param_obj;
end
obj = - sum(tmp);
OD.in1 = in1;
OD.in2 = in2;
% OD.OUT.param_obj is an array of doubles that each correspond to each
% combination of [in1(1), in2(1)], [in1(2), in2(2)] and so on
end
That might work, but it would take almost the same amount of time, because it's still calling expensive_function for each combination of OD, right?
It's just a test whether your "expensive_function" works correctly.
If the code
function [obj] = obj_fun( PLA )
in1 = OD.in1;
in2 = OD.in2;
tmp = zeros(size(PLA));
for i = 1:numel(PLA)
OD.PLA = PLA(i);
OD.in1 = in1(i);
OD.in2 = in2(i);
[OD.OUT, OD.IN] = expensive_function(OD);
tmp(i) = OD.OUT.param_obj;
end
obj = - sum(tmp);
OD.in1 = in1;
OD.in2 = in2;
% OD.OUT.param_obj is an array of doubles that each correspond to each
% combination of [in1(1), in2(1)], [in1(2), in2(2)] and so on
end
works and the code
function [obj] = obj_fun( PLA )
OD.PLA = PLA; % PLA is an array
[OD.OUT, OD.IN] = expensive_function(OD);
obj = - sum(OD.OUT.param_obj);
% OD.OUT.param_obj is an array of doubles that each correspond to each
% combination of [in1(1), in2(1)], [in1(2), in2(2)] and so on
end
does not work, the problem must lie in your "expensive_function".
It seems that evaluation of array inputs to "expensive_function" for in1, in2 and PLA give different results in param_obj as if "expensive_function" were called for each array element separately.
I would test this outside the optimization environment.
Wait, I figured it out. The PLA was not adjusted by fmincon until obj_fun was called 10+ times for some reason. After that the optimization finally happened. That's why before I thought it's stuck. My bad.
I'm glad you got it working, but if you have now achieved a solution through this answer, please Accept-click it.
Ok, it worked but sadly not faster at all, probably because you can't optimize 20 independent variables with only one objective...
Is there anyway to do parallel optimization simultaneously, while calling the same objective function together? So expensive_function processes 20 PLAs at the same time, but optimizer 1 only adjusts PLA(1) based on param_obj(1) and so on. Mathematically it should be easy, but I don't know how to do that in Matlab. It doesn't even need to be fmincon if there are other options.
I only see the option to call fmincon 20 times in a parfor loop for 1 PLA at a time. This doesn't use the option to evaluate your expensive_function for several PLA inputs, but it should be faster than using 20 PLAs in 1 call to "fmincon". But maybe 20 is still too little to see an advantage of the parallel option taking into account the overhead of parallelization.
Ok, it worked but sadly not faster at all, probably because you can't optimize 20 independent variables with only one objective...
Certainly you can. Even without UseParallel, it is unclear why you don't see speed-up based on what you've told us. You yourself claimed that expensive_function() could process 20 PLA's faster as a batch than with 20 separate calls. That alone should have offered some improvement.
Maybe calling "fmincon" 20 times with one solution variable is faster than calling it once with 20 solution variables.
Like solving 20 (independent) ODEs in one call to an ode integrator will most probably be slower than solving 1 ODE at a time in 20 calls to the ode integrator.
Come to think of it, Torsten is right, but primarily because you are not using the SpecifyObjectiveGradient and SpecifyObjectiveConstraints options. If you have a way of supplying the objective and constraint gradients analytically, you should do so, and it should make things faster.
If you have to rely on the default finite difference approach, then batch computations will not help you. fmincon calls the objective 2*N times where N is the number of independent variables. So, if you try to optimize in batches of N PLAs, the number of function calls required per iteration also grows by a factor of N.
If you have a way of analytically supplying the objective and constraint gradients analytically, you should do so, and it should make things faster.
Even if you don't have an analytical formula for the derivatives, you could try to approximate them with your own finite difference computation. In this special case where your objective function is a separable sum, the finite difference derivatives can be computed as
where E() is the vector-valued output of expensive_function(). This results in only two calls to expensive_function() per iteration instead of N. Similar things apply to your nonlinear cnostraint functions.
Even without UseParallel, it is unclear why you don't see speed-up based on what you've told us. You yourself claimed that expensive_function() could process 20 PLA's faster as a batch than with 20 separate calls. That alone should have offered some improvement.
It's like you and Torsten said. "expensive_function" processes 20 PLAs as a batch a lot faster than with 20 separate calls. However, expensive_function was called more times than when solving for one PLA only, so in the end it didn't really help.
Seems like the only options left are making better initial guesses for PLA and approximating the derivatives. So you're saying I should make another function to calculate E(x_i) and E(x_i + δ) and supply the resulting gradient array to fmincon? Should I just set δ to be something very small, or is there a better way to do it?
Seems like the only options left are making better initial guesses for PLA and approximating the derivatives.
And also Torsten's suggestion of using a parfor loop. That might be the simplest.
So you're saying I should make another function to calculate E(x_i) and E(x_i + δ)
The gradient computations must be done inside your objective functions as described here,
Obviously, you are free to out-source parts of that computation with calls to other functions, if you wish.
Should I just set δ to be something very small, or is there a better way to do it?
You will need to experiment with δ. The CheckGradients fmincon option can help you see if the derivatives you've provided agree with fmincon's own numerical approximation.

请先登录,再进行评论。

更多回答(1 个)

There are other acceleration strategies that may be superior to parallelization.
I notice in your examples that the in1,in2 pairs change by small amounts. You also appear to believe that the final solutions for PLA will be close to each other, otherwise why initialize the optimization with the same x0? Those being the case, it may be worthwile to go back to the strategy of optimizing one PLA at a time. However, you would use the optimized PLA from the N-th optimization as the x0 for the next optimization. This might cut down on the number of iterations you need to get the solution, and reduce the number of calls to expensive function.

1 个评论

The in1, in2 shown here are not the real values. In reality there are more variables and some vary quite a lot. I set x0 to the same only because the final optimized values are hard to predict (in gerneral between 80 and 105), but you're right. I should try to give PLA a better initial guess.

请先登录,再进行评论。

类别

帮助中心File Exchange 中查找有关 Get Started with Optimization Toolbox 的更多信息

产品

版本

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by