lsqnonlin: CheckGradients fails after multilying Jacobian with diagonal scaling matrix

3 次查看(过去 30 天)
Dear all,
function [f,J] = fun_no_scaling(params)
f=r(params); %vector of observations
J=...; %the Jacobian dr/dparams
end
This is the function which is called by lsqnonlin which returns the vector-valued function f along with the Jacobian that I compute by myself. I set the CheckGradients flag to true and the gradient check is passed sucessfully, that is, my Jacobian seems to be correct. The vector in the above function contains the difference of simulation data (depend on params) and experimental data.
Now, I scaled the problem by normalizing it (1/max) and CheckGradients fails. Here is what I did:
function [f,J] = fun_scaling(params)
f=r(params); %same as above
J=...; %same as above
%diagonal scaling matrix
D = 1/max(f) .*eye(length(f),length(f));
%scale f
f=D*f;
%The new Jacobian of the scaled problem (f=D*f), is simply
%J=d(D*r)/darams = D*dr/dparams = D*J, right?
J=D*J;
end
The Jacobian of fun_scaling and the finite-difference Jacobian from Matlab differ now significantly. However, I do not understand why.
Is my scaling matrix not constant with resect to params such that I can just re-multiply the unsclaed Jacobian with the scaling matrix?

采纳的回答

Matt J
Matt J 2023-1-13
编辑:Matt J 2023-1-13
%J=d(D*r)/darams = D*dr/dparams = D*J, right?
No, it doesn't work because D=1/max(f) is not a constant. It depends on the unknown params as well. Therefore, you cannot simply scale the Jacobian by D.
  8 个评论
Matt J
Matt J 2023-1-14
编辑:Matt J 2023-1-15
I see. But the issue is that the entries of f are of different orders, say 10 entries of 1e-1 and 10 entries of 1e3. If I do not normalize them, the optimization is biased to minimize only the larger values.
Normalizing by max(f) or any other scalar weight would not change that. They would still be orders of magnitude different from each other regardless of the weighting.
You could try something like this:
x0=...
D=1./max( abs(fun(x0)) , 1e-2 );
x=lsqnonlin( @(p)fun(p,D) , x0)
function [f,J] = fun(params,D)
if nargin<2, D=1; end
f=r(params); %same as above
J=...; %same as above
f=D(:).*f(:); %works for vector D as well
if nargout>1
J=D(:).*J;
end
end
SA-W
SA-W 2023-1-14
" Normalizing by max(f) or any other scalar weight would not change that. They would still be orders of magnitude different from each other regardless of the weighting. "
I would not use one scale factor, but two:
r=(0.4,0.2,50000,100000)
Then, I would define
w_1=1/0.4 ,
w_2=1/100000
r_scaled = (1,0.5,0.5,1)
With that, the bias should be eliminated. Right?
I will give your suggestion a try. Another idea is to scale based on the maximum of the experimental data. At least, I think so.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Surrogate Optimization 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by