fmincon check gradients fail

Question

mahmoud tarek 2020-3-14

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/510863-fmincon-check-gradients-fail

编辑： Matt J 2020-3-26

I'm trying to provide vectorized gradients for my constraints of my 'nonlincon' function to reduce the run time.

My optimization problem has 11 variables from x1 to x11 and my 'nonlincon' calculates 6 constraints so my gradient matrix is 11 by 6

i tried to mimic how fmincon calculates delta for finite differences.

but when i use check gradients with gradient constraint options the check gradients fails no matter what value i use for finite difference step size.

is there something wrong with how i am calculating the gradients or my delta?

this is the options i used with fmincon

  options_fmincon = optimoptions('fmincon','ConstraintTolerance', 0.001,'Diagnostics','on','Display','iter-detailed','PlotFcn',{@optimplotx,@optimplotfunccount,@optimplotfval,@optimplotconstrviolation,@optimplotstepsize,@optimplotfirstorderopt},'OptimalityTolerance',0.001,'SpecifyConstraintGradient',true,'CheckGradients',true,'FiniteDifferenceStepSize',0.01,'TypicalX',ones(1,11));

The code is like this :

% fmincon Nonlinear constraints function
function [C, CEQ, Jacobian, DCEQ] = NonLinCon(X, LUT, IN_SPEC)
 DOF.M(1).A = X(1);
 DOF.M(2).A = X(2);
 DOF.M(3).A = X(3);
 DOF.M(4).A = X(4);
 DOF.M(5).A = X(5);
 DOF.M(1).B = X(6);
 DOF.M(2).B = X(7);
 DOF.M(3).B = X(8);
 DOF.M(4).B = X(9);
 DOF.M(5).B = X(10);
 DOF.IB = X(11)*1e-3;
DOF.C = IN_SPEC.C;          
DOF.D = IN_SPEC.D;
DOF.E = IN_SPEC.E;
DOF.F = IN_SPEC.F;
DOF.G = IN_SPEC.G;
DOF.H = IN_SPEC.H;
DOF.M(6).A = DOF.M(5).A;
DOF.M(6).B = DOF.M(5).B;
[~, OUT_SPEC] = calculatespecs(LUT,DOF);
if nargout > 2
    T_X = ones(1,11);
   v = ones(1,11);
   delta = v.*sign(X).*max(abs(X),T_X);
        Xp = X;
        DOF_p = DOF;
 DOF_p.M(1).A = [Xp(1) + delta(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1)];
 DOF_p.M(2).A = [Xp(2), Xp(2) + delta(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2)];
 DOF_p.M(3).A = [Xp(3), Xp(3), Xp(3) + delta(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3)];
 DOF_p.M(4).A = [Xp(4), Xp(4), Xp(4), Xp(4) + delta(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4)];
 DOF_p.M(5).A = [Xp(5), Xp(5), Xp(5), Xp(5), Xp(5) + delta(5), Xp(5), Xp(5), Xp(5), Xp(5), Xp(5), Xp(5)];
 DOF_p.M(1).B = [Xp(6), Xp(6), Xp(6), Xp(6), Xp(6), Xp(6) + delta(6), Xp(6), Xp(6), Xp(6), Xp(6), Xp(6)];
 DOF_p.M(2).B = [Xp(7), Xp(7), Xp(7), Xp(7), Xp(7), Xp(7), Xp(7) + delta(7), Xp(7), Xp(7), Xp(7), Xp(7)];
 DOF_p.M(3).B = [Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8) + delta(8), Xp(8), Xp(8), Xp(8)];
 DOF_p.M(4).B = [Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9) + delta(9), Xp(9), Xp(9)];
 DOF_p.M(5).B = [Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10) + delta(10), Xp(10)];
 DOF_p.IB = [Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3 + delta(11)];
 
 DOF_p.M(6).A = DOF_p.M(5).A;
 DOF_p.M(6).B = DOF_p.M(5).B;
        
        [~, OUT_SPEC_p] = calculatespecs(LUT,DOF_p);
       
        Jacobian(:,1) = abs((double(OUT_SPEC_p.A) - double(OUT_SPEC.A)) ./ delta);
        Jacobian(:,2) = abs((double(OUT_SPEC_p.B) - double(OUT_SPEC.B)) ./ delta);
        Jacobian(:,3) = abs((log10(double(OUT_SPEC_p.C)) - log10(double(OUT_SPEC.C))) ./ delta);
        Jacobian(:,4) = abs((double(OUT_SPEC_p.D) - double(OUT_SPEC.D)) ./ delta);
        Jacobian(:,5) = abs((double(OUT_SPEC_p.E) - double(OUT_SPEC.E)) ./ delta);
        Jacobian(:,6) = abs((double(OUT_SPEC_p.F) - double(OUT_SPEC.F)) ./ delta);
    DCEQ = [];
end
C(1) = IN_SPEC.A ./ double(OUT_SPEC.A) - 1;
C(2) = IN_SPEC.B ./ double(OUT_SPEC.B) - 1;
C(3) = log10(IN_SPEC.C) ./ log10(double(OUT_SPEC.C)) - 1;
C(4) = IN_SPEC.D ./ double(OUT_SPEC.D) - 1;
C(5) = IN_SPEC.E ./ double(OUT_SPEC.E) - 1;
C(6) = IN_SPEC.F ./ double(OUT_SPEC.F) - 1;
CEQ = [];
    
end
        
       

4 个评论
显示 2更早的评论隐藏 2更早的评论

mahmoud tarek 2020-3-26

I tried to use tic toc at the start and end of my NonLinCon and Obj functions to calculate the time of one call and this is what i got:

First for NonLinCon with only 2 output arguments:

The function is called multiple times per iteration. The call time varies between 0.010 to 0.2 seconds.

Second for NonLinCon with 4 output arguments:

The call time varies between 0.01 to 0.06 seconds.

My Obj function has no gradient function.

The run time of my Obj function with one output argument varies between 0.000002 to 0.000759 seconds

Matt J 2020-3-26

编辑：Matt J 2020-3-26

The call time varies between 0.010 to 0.2 seconds.

Don't you mean 0.01 to 0.02?

The function is called multiple times per iteration

You need to count how many times in an iteration it calls wtih 2 arguments (when it doesn't need the gradient) and how many times it calls with 4 (I assume once per iteration). If NonLinCon gets called 20 times per iteration with just 2 arguments, then accelerating the gradient calculation won't have much of an impact, because the gradient calculation represents a small fraction of the computation per iteration.

It may be helpful to move your finite difference operations to a separate function, called from within NonLinCon. You can then use the profiler

https://www.mathworks.com/help/matlab/matlab_prog/profiling-for-improving-performance.html

to see how much times is spent in gradient calculation versus other parts of NonLinCon.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Matt J 2020-3-14

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/510863-fmincon-check-gradients-fail#answer_420142

编辑：Matt J 2020-3-14

CheckGradient uses central differences, whereas you appear to be using right handed differences. Are you certain that your constraints are differentiable?

3 个评论
显示 1更早的评论隐藏 1更早的评论

mahmoud tarek 2020-3-17

在 MATLAB Online 中打开

i tried to calculate the gradients using central differences to match check gradients but still fails

T_X = ones(1,11);
   v = 0.01*ones(1,11);                       % FiniteDifferenceStepSize
   delta = v.*max(abs(X),T_X);           % Delta for central finite differences
Xp = X;
        DOF_p = DOF;
 DOF_p.M(1).A = [Xp(1) + delta(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1)];
 DOF_p.M(2).A = [Xp(2), Xp(2) + delta(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2)];
 DOF_p.M(3).A = [Xp(3), Xp(3), Xp(3) + delta(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3)];
 DOF_p.M(4).A = [Xp(4), Xp(4), Xp(4), Xp(4) + delta(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4)];
 DOF_p.M(5).A = [Xp(5), Xp(5), Xp(5), Xp(5), Xp(5) + delta(5), Xp(5), Xp(5), Xp(5), Xp(5), Xp(5), Xp(5)];
 DOF_p.M(1).B = [Xp(6), Xp(6), Xp(6), Xp(6), Xp(6), Xp(6) + delta(6), Xp(6), Xp(6), Xp(6), Xp(6), Xp(6)];
 DOF_p.M(2).B = [Xp(7), Xp(7), Xp(7), Xp(7), Xp(7), Xp(7), Xp(7) + delta(7), Xp(7), Xp(7), Xp(7), Xp(7)];
 DOF_p.M(3).B = [Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8) + delta(8), Xp(8), Xp(8), Xp(8)];
 DOF_p.M(4).B = [Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9) + delta(9), Xp(9), Xp(9)];
 DOF_p.M(5).B = [Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10) + delta(10), Xp(10)];
 DOF_p.IB = [Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3 + delta(11)];
 DOF_p.M(6).A = DOF_p.M(5).L;
 DOF_p.M(6).B = DOF_p.M(5).RHO;
        
        [~, OUT_SPEC_p] = calculatespecs(LUT,DOF_p);
        
 Xn = X;
        DOF_n = DOF;
 DOF_n.M(1).A = [Xn(1) - delta(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1)];
 DOF_n.M(2).A = [Xn(2), Xn(2) - delta(2), Xn(2), Xn(2), Xn(2), Xn(2), Xn(2), Xn(2), Xn(2), Xn(2), Xn(2)];
 DOF_n.M(3).A = [Xn(3), Xn(3), Xn(3) - delta(3), Xn(3), Xn(3), Xn(3), Xn(3), Xn(3), Xn(3), Xn(3), Xn(3)];
 DOF_n.M(4).A = [Xn(4), Xn(4), Xn(4), Xn(4) - delta(4), Xn(4), Xn(4), Xn(4), Xn(4), Xn(4), Xn(4), Xn(4)];
 DOF_n.M(5).A = [Xn(5), Xn(5), Xn(5), Xn(5), Xn(5) - delta(5), Xn(5), Xn(5), Xn(5), Xn(5), Xn(5), Xn(5)];
 DOF_n.M(1).B = [Xn(6), Xn(6), Xn(6), Xn(6), Xn(6), Xn(6) - delta(6), Xn(6), Xn(6), Xn(6), Xn(6), Xn(6)];
 DOF_n.M(2).B = [Xn(7), Xn(7), Xn(7), Xn(7), Xn(7), Xn(7), Xn(7) - delta(7), Xn(7), Xn(7), Xn(7), Xn(7)];
 DOF_n.M(3).B = [Xn(8), Xn(8), Xn(8), Xn(8), Xn(8), Xn(8), Xn(8), Xn(8) - delta(8), Xn(8), Xn(8), Xn(8)];
 DOF_n.M(4).B = [Xn(9), Xn(9), Xn(9), Xn(9), Xn(9), Xn(9), Xn(9), Xn(9), Xn(9) - delta(9), Xn(9), Xn(9)];
 DOF_n.M(5).B = [Xn(10), Xn(10), Xn(10), Xn(10), Xn(10), Xn(10), Xn(10), Xn(10), Xn(10), Xn(10) - delta(10), Xn(10)];
 DOF_n.IB = [Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3 - delta(11)];
 DOF_n.M(6).A = DOF_n.M(5).A;
 DOF_n.M(6).B = DOF_n.M(5).B;
        
        [~, OUT_SPEC_n] = calculatespecs(LUT,DOF_n); 
 
        %%%%%%%%%%%  Jacobian using central finite differences   %%%%%%%%%%%%%
        Jacobian(:,1) = (double(OUT_SPEC_p.A) - double(OUT_SPEC_n.A)) ./ (2.*delta);
        Jacobian(:,2) = (double(OUT_SPEC_p.B) - double(OUT_SPEC_n.B)) ./ (2.*delta);
        Jacobian(:,3) = (log10(double(OUT_SPEC_p.C)) - log10(double(OUT_SPEC_n.C))) ./ (2.*delta);
        Jacobian(:,4) = (double(OUT_SPEC_p.D) - double(OUT_SPEC_n.D)) ./ (2.*delta);
        Jacobian(:,5) = (double(OUT_SPEC_p.E) - double(OUT_SPEC_n.E)) ./ (2.*delta);
       Jacobian(:,6) = (double(OUT_SPEC_p.F) - double(OUT_SPEC_n.F)) ./ (2.*delta);

Matt J 2020-3-17

编辑：Matt J 2020-3-17

I recommend plotting your Jacobian elements as a function of delta. They should converge to something as delta gets smaller. The first question is if they do converge. The second question is, if they do converge, how does the limit compare to Matlab's Jacobian calculation result and/or the result of jacobianest (from the link I gave earlier). Do this analysis for all the elements.

请先登录，再进行评论。