fmincon check gradients fail

10 次查看(过去 30 天)
I'm trying to provide vectorized gradients for my constraints of my 'nonlincon' function to reduce the run time.
My optimization problem has 11 variables from x1 to x11 and my 'nonlincon' calculates 6 constraints so my gradient matrix is 11 by 6
i tried to mimic how fmincon calculates delta for finite differences.
but when i use check gradients with gradient constraint options the check gradients fails no matter what value i use for finite difference step size.
is there something wrong with how i am calculating the gradients or my delta?
this is the options i used with fmincon
options_fmincon = optimoptions('fmincon','ConstraintTolerance', 0.001,'Diagnostics','on','Display','iter-detailed','PlotFcn',{@optimplotx,@optimplotfunccount,@optimplotfval,@optimplotconstrviolation,@optimplotstepsize,@optimplotfirstorderopt},'OptimalityTolerance',0.001,'SpecifyConstraintGradient',true,'CheckGradients',true,'FiniteDifferenceStepSize',0.01,'TypicalX',ones(1,11));
The code is like this :
% fmincon Nonlinear constraints function
function [C, CEQ, Jacobian, DCEQ] = NonLinCon(X, LUT, IN_SPEC)
DOF.M(1).A = X(1);
DOF.M(2).A = X(2);
DOF.M(3).A = X(3);
DOF.M(4).A = X(4);
DOF.M(5).A = X(5);
DOF.M(1).B = X(6);
DOF.M(2).B = X(7);
DOF.M(3).B = X(8);
DOF.M(4).B = X(9);
DOF.M(5).B = X(10);
DOF.IB = X(11)*1e-3;
DOF.C = IN_SPEC.C;
DOF.D = IN_SPEC.D;
DOF.E = IN_SPEC.E;
DOF.F = IN_SPEC.F;
DOF.G = IN_SPEC.G;
DOF.H = IN_SPEC.H;
DOF.M(6).A = DOF.M(5).A;
DOF.M(6).B = DOF.M(5).B;
[~, OUT_SPEC] = calculatespecs(LUT,DOF);
if nargout > 2
T_X = ones(1,11);
v = ones(1,11);
delta = v.*sign(X).*max(abs(X),T_X);
Xp = X;
DOF_p = DOF;
DOF_p.M(1).A = [Xp(1) + delta(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1)];
DOF_p.M(2).A = [Xp(2), Xp(2) + delta(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2)];
DOF_p.M(3).A = [Xp(3), Xp(3), Xp(3) + delta(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3)];
DOF_p.M(4).A = [Xp(4), Xp(4), Xp(4), Xp(4) + delta(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4)];
DOF_p.M(5).A = [Xp(5), Xp(5), Xp(5), Xp(5), Xp(5) + delta(5), Xp(5), Xp(5), Xp(5), Xp(5), Xp(5), Xp(5)];
DOF_p.M(1).B = [Xp(6), Xp(6), Xp(6), Xp(6), Xp(6), Xp(6) + delta(6), Xp(6), Xp(6), Xp(6), Xp(6), Xp(6)];
DOF_p.M(2).B = [Xp(7), Xp(7), Xp(7), Xp(7), Xp(7), Xp(7), Xp(7) + delta(7), Xp(7), Xp(7), Xp(7), Xp(7)];
DOF_p.M(3).B = [Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8) + delta(8), Xp(8), Xp(8), Xp(8)];
DOF_p.M(4).B = [Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9) + delta(9), Xp(9), Xp(9)];
DOF_p.M(5).B = [Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10) + delta(10), Xp(10)];
DOF_p.IB = [Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3 + delta(11)];
DOF_p.M(6).A = DOF_p.M(5).A;
DOF_p.M(6).B = DOF_p.M(5).B;
[~, OUT_SPEC_p] = calculatespecs(LUT,DOF_p);
Jacobian(:,1) = abs((double(OUT_SPEC_p.A) - double(OUT_SPEC.A)) ./ delta);
Jacobian(:,2) = abs((double(OUT_SPEC_p.B) - double(OUT_SPEC.B)) ./ delta);
Jacobian(:,3) = abs((log10(double(OUT_SPEC_p.C)) - log10(double(OUT_SPEC.C))) ./ delta);
Jacobian(:,4) = abs((double(OUT_SPEC_p.D) - double(OUT_SPEC.D)) ./ delta);
Jacobian(:,5) = abs((double(OUT_SPEC_p.E) - double(OUT_SPEC.E)) ./ delta);
Jacobian(:,6) = abs((double(OUT_SPEC_p.F) - double(OUT_SPEC.F)) ./ delta);
DCEQ = [];
end
C(1) = IN_SPEC.A ./ double(OUT_SPEC.A) - 1;
C(2) = IN_SPEC.B ./ double(OUT_SPEC.B) - 1;
C(3) = log10(IN_SPEC.C) ./ log10(double(OUT_SPEC.C)) - 1;
C(4) = IN_SPEC.D ./ double(OUT_SPEC.D) - 1;
C(5) = IN_SPEC.E ./ double(OUT_SPEC.E) - 1;
C(6) = IN_SPEC.F ./ double(OUT_SPEC.F) - 1;
CEQ = [];
end
  4 个评论
mahmoud tarek
mahmoud tarek 2020-3-26
I tried to use tic toc at the start and end of my NonLinCon and Obj functions to calculate the time of one call and this is what i got:
First for NonLinCon with only 2 output arguments:
The function is called multiple times per iteration. The call time varies between 0.010 to 0.2 seconds.
Second for NonLinCon with 4 output arguments:
The call time varies between 0.01 to 0.06 seconds.
My Obj function has no gradient function.
The run time of my Obj function with one output argument varies between 0.000002 to 0.000759 seconds
Matt J
Matt J 2020-3-26
编辑:Matt J 2020-3-26
The call time varies between 0.010 to 0.2 seconds.
Don't you mean 0.01 to 0.02?
The function is called multiple times per iteration
You need to count how many times in an iteration it calls wtih 2 arguments (when it doesn't need the gradient) and how many times it calls with 4 (I assume once per iteration). If NonLinCon gets called 20 times per iteration with just 2 arguments, then accelerating the gradient calculation won't have much of an impact, because the gradient calculation represents a small fraction of the computation per iteration.
It may be helpful to move your finite difference operations to a separate function, called from within NonLinCon. You can then use the profiler
to see how much times is spent in gradient calculation versus other parts of NonLinCon.

请先登录,再进行评论。

采纳的回答

Matt J
Matt J 2020-3-14
编辑:Matt J 2020-3-14
CheckGradient uses central differences, whereas you appear to be using right handed differences. Are you certain that your constraints are differentiable?
  3 个评论
mahmoud tarek
mahmoud tarek 2020-3-17
i tried to calculate the gradients using central differences to match check gradients but still fails
T_X = ones(1,11);
v = 0.01*ones(1,11); % FiniteDifferenceStepSize
delta = v.*max(abs(X),T_X); % Delta for central finite differences
Xp = X;
DOF_p = DOF;
DOF_p.M(1).A = [Xp(1) + delta(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1)];
DOF_p.M(2).A = [Xp(2), Xp(2) + delta(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2)];
DOF_p.M(3).A = [Xp(3), Xp(3), Xp(3) + delta(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3)];
DOF_p.M(4).A = [Xp(4), Xp(4), Xp(4), Xp(4) + delta(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4)];
DOF_p.M(5).A = [Xp(5), Xp(5), Xp(5), Xp(5), Xp(5) + delta(5), Xp(5), Xp(5), Xp(5), Xp(5), Xp(5), Xp(5)];
DOF_p.M(1).B = [Xp(6), Xp(6), Xp(6), Xp(6), Xp(6), Xp(6) + delta(6), Xp(6), Xp(6), Xp(6), Xp(6), Xp(6)];
DOF_p.M(2).B = [Xp(7), Xp(7), Xp(7), Xp(7), Xp(7), Xp(7), Xp(7) + delta(7), Xp(7), Xp(7), Xp(7), Xp(7)];
DOF_p.M(3).B = [Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8) + delta(8), Xp(8), Xp(8), Xp(8)];
DOF_p.M(4).B = [Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9) + delta(9), Xp(9), Xp(9)];
DOF_p.M(5).B = [Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10) + delta(10), Xp(10)];
DOF_p.IB = [Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3 + delta(11)];
DOF_p.M(6).A = DOF_p.M(5).L;
DOF_p.M(6).B = DOF_p.M(5).RHO;
[~, OUT_SPEC_p] = calculatespecs(LUT,DOF_p);
Xn = X;
DOF_n = DOF;
DOF_n.M(1).A = [Xn(1) - delta(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1)];
DOF_n.M(2).A = [Xn(2), Xn(2) - delta(2), Xn(2), Xn(2), Xn(2), Xn(2), Xn(2), Xn(2), Xn(2), Xn(2), Xn(2)];
DOF_n.M(3).A = [Xn(3), Xn(3), Xn(3) - delta(3), Xn(3), Xn(3), Xn(3), Xn(3), Xn(3), Xn(3), Xn(3), Xn(3)];
DOF_n.M(4).A = [Xn(4), Xn(4), Xn(4), Xn(4) - delta(4), Xn(4), Xn(4), Xn(4), Xn(4), Xn(4), Xn(4), Xn(4)];
DOF_n.M(5).A = [Xn(5), Xn(5), Xn(5), Xn(5), Xn(5) - delta(5), Xn(5), Xn(5), Xn(5), Xn(5), Xn(5), Xn(5)];
DOF_n.M(1).B = [Xn(6), Xn(6), Xn(6), Xn(6), Xn(6), Xn(6) - delta(6), Xn(6), Xn(6), Xn(6), Xn(6), Xn(6)];
DOF_n.M(2).B = [Xn(7), Xn(7), Xn(7), Xn(7), Xn(7), Xn(7), Xn(7) - delta(7), Xn(7), Xn(7), Xn(7), Xn(7)];
DOF_n.M(3).B = [Xn(8), Xn(8), Xn(8), Xn(8), Xn(8), Xn(8), Xn(8), Xn(8) - delta(8), Xn(8), Xn(8), Xn(8)];
DOF_n.M(4).B = [Xn(9), Xn(9), Xn(9), Xn(9), Xn(9), Xn(9), Xn(9), Xn(9), Xn(9) - delta(9), Xn(9), Xn(9)];
DOF_n.M(5).B = [Xn(10), Xn(10), Xn(10), Xn(10), Xn(10), Xn(10), Xn(10), Xn(10), Xn(10), Xn(10) - delta(10), Xn(10)];
DOF_n.IB = [Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3 - delta(11)];
DOF_n.M(6).A = DOF_n.M(5).A;
DOF_n.M(6).B = DOF_n.M(5).B;
[~, OUT_SPEC_n] = calculatespecs(LUT,DOF_n);
%%%%%%%%%%% Jacobian using central finite differences %%%%%%%%%%%%%
Jacobian(:,1) = (double(OUT_SPEC_p.A) - double(OUT_SPEC_n.A)) ./ (2.*delta);
Jacobian(:,2) = (double(OUT_SPEC_p.B) - double(OUT_SPEC_n.B)) ./ (2.*delta);
Jacobian(:,3) = (log10(double(OUT_SPEC_p.C)) - log10(double(OUT_SPEC_n.C))) ./ (2.*delta);
Jacobian(:,4) = (double(OUT_SPEC_p.D) - double(OUT_SPEC_n.D)) ./ (2.*delta);
Jacobian(:,5) = (double(OUT_SPEC_p.E) - double(OUT_SPEC_n.E)) ./ (2.*delta);
Jacobian(:,6) = (double(OUT_SPEC_p.F) - double(OUT_SPEC_n.F)) ./ (2.*delta);
Matt J
Matt J 2020-3-17
编辑:Matt J 2020-3-17
I recommend plotting your Jacobian elements as a function of delta. They should converge to something as delta gets smaller. The first question is if they do converge. The second question is, if they do converge, how does the limit compare to Matlab's Jacobian calculation result and/or the result of jacobianest (from the link I gave earlier). Do this analysis for all the elements.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Systems of Nonlinear Equations 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by