Why fminunc does not find the true global minimum?

Hi all, I should solve this unconstrained optimization problem (attached). I know that the function has the global minimum at [1 2 2 3]. However, if I set as starting value [1 2 2 3], the algorithm ends up at [1.1667 2.4221 2.2561 3]. I have some doubts to clarify (I'm not familiar with this topic, sorry for my trivial questions):
1) The algorithm output reveals that at iteration 0 the function takes value 5.47709e-06 and at iteration 10 the function takes value 1.41453e-06. But, if I compute the function value at [1 2 2 3] I get 1.4140e-06 and if I compute the function value at [1.1667 2.4221 2.2561 3] I get 1.5635e-06. Why are these values different from the starting and final function values reported in the algorithm output?
2) How can I force the algorithm to keep searching until it arrives at [1 2 2 3]?
Thanks!

 采纳的回答

When you call fminunc with all 5 outputs
[x,fval,exitflag,output,grad,hessian]= fminunc(...)
what are the values of these outputs?
In particular, if the true Hessian is singular at the global min [1 2 2 3], I can imagine its finite difference estimate, as computed by fminunc, could be numerically problematic, e.g., not positive definite.

7 个评论

This is what I get:
grad=[-1.437e-09; 7.754e-09; 4.335e-09; 0]
hessian= [5.820e-07 1.394e-06 8.276e-07 -3.397e-10;...
1.394e-06 3.484e-06 2.101e-06 -1.636e-10;...
8.276e-07 2.101e-06 1.266e-06 -1.757e-10;...
-3.397e-10 -1.636e-10 -1.757e-10 -1.321e-10]
Your function is over-parametrized. The loglikelihood only depends on alpha1 and alpha2 through the expression alpha1+alpha2, which can therefore be replaced with a single parameter. This is sure to lead to ill-conditioning. The condition number of the output hessian you've shown also doesn't look too good,
>> cond(hessian)
ans =
4.1215e+04
Also you have a bug where you unpack the parameters. Notice that alpha2 and beta1 are both being assigned theta(2) while theta(4) is never used
alpha1=theta(1);
alpha2=theta(2);
beta1=theta(2);
beta2=theta(3);
I'm not quite sure about your first comment: my problem cannot be reparameterize in the way you suggest. Your second comment instead has solved my problem: I have corrected the code and now if I start from [1 2 2 3] then the algorithm stays there with zero iterations. Thank you very much!
Are you suggesting that the fact that the algorithm starting from the truth stops at the truth in zero iterations has not significant implications for my problem? What would you suggest to do in order to get more significant conclusions?
my problem cannot be reparameterize in the way you suggest.
Forget it. Nevertheless, your code can be cleaner and more efficient. Below is my idea of what it should look like. Notice that there is alot that you can pre-compute in the interests of speed. Notice also the more modern way of passing fixed data and parameters to functions.
thetatrue=[1 2 2 3];
mu = [0 0];
sd = [1 0.3; 0.3 1];
A1(:,2)=-ix(:,1); A1(:,1)=-1;
A2(:,2)=-ix(:,2); A2(:,1)=-1;
cdfun=@(x) mvncdf( [A1*[x(1);x(3)], A2*[x(2);x(4)] ],mu,sd);
W1=cdfun(thetatrue);
W2=1-W1;
options=optimset('Display','iter','MaxIter',10000,'TolX',10^-30,'TolFun',10^-30);
theta0=[1 2 2 3]; %Starting values
[theta,fval,exitflag,output]=...
fminunc(@(x) log_lik(x, cdfun,W1,W2),theta0,options);
function val=log_lik(theta,cdfun,W1,W2)
z=cdfun(theta);
val=-sum(W1.*log(z)+W2.*log(1-z));

请先登录,再进行评论。

更多回答(1 个)

I did not look at your data. But I doubt that the true global minimum is at [1 2 2 3] if you are really fitting to data. I would bet that you generated data from a known distribution, and then fit the model to that data. You will never get perfect match to the initial distribution, because the data that you used is not perfectly distributed according to the theoretical distribution.
For instance, this toolbox example shows theoretical parameters of [1 3 2], and yet the fitted model has parameters [1.0169 3.1444 2.1596], and the fitted model is at a global minimum for that data set.
Alan Weiss
MATLAB mathematical toolbox documentation

1 个评论

If the function I maximized was the sample log-likelihood function for Y|X~f(theta), then you were right; but the function that I maximize is the expected log-likelihood and the data are just the X I'm conditioning on. I'm sure that the expected log-likelihood is maximized at [1 2 2 3] (I have some theoretical results which actually show this). Is there anyone who can help me in answering questions 1 and 2?

请先登录,再进行评论。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by