Weighted fit with lsqcurvefit and (ideally) multistart?

Question

Matt Bilyard 2019-11-12

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/490673-weighted-fit-with-lsqcurvefit-and-ideally-multistart

评论： Matt Bilyard 2019-11-21

Hi,

I'm working on a problem that involves fitting experimental data to a model comprising a system of (connected) ODEs, in order to extrapolate optimum parameters (k) that give the best fit. For general background, the data refers to levels of metabolites in cells, and the model is a simple one of the metabolic pathways that connect them. (Can't say too much more for now!).

I've been using lsqcurvefit with multistart to fit this. It works pretty decently (let me know if not, it'll be a silly copying error!). However, I was wondering if it is possible to apply a weighting function to the fitting (either using this fitting method or otherwise)? Rationale - the absolute values of each c(n) have very different absolute magnitudes and error magnitudes, simply as they had to be multiplied by different constant factors to process them. Ideally, the weights would be roughly:

- c(1), c(2), c(6): 1

- c(3), c(7): 40

- all others: 4000

Without weighting, the fit for the larger c(1), c(2), c(3), c(6) and c(7) is great - but for the smaller other ones it is less good. (Or, rather, the fit is OK, but I get wildly different k-values each time for steps involving the smaller c(n)).

Any advice would be much appreciated. I am something of a novice - basically a biologist teaching myself matlab! - so please excuse any ignorance. Also happy to hear alternative approaches of course. I've copied the raw data to fit, the objective function, and my multistart fitting below in that order. I have them as separate files, so I haven't attempted to combined them into one...

Thanks a lot,

Matt

%experimental data to fit%
time_full = [0;1;2;3;4;5;6;7;8;9;10;24;48;96;144;192;240;288;336;384;432];
label_full = [95.898177	0	0	0	0	4	0.1	0.001055	0.000768
898177	0.116654171	0.000169089	0	0	3.883345829	0.099830911	0.001055	0.000768
898177	0.221311714	0.000215784	0	0	3.778688286	0.099784216	0.001055	0.000768
898177	0.361815956	0.001337332	0	0	3.638184044	0.098662668	0.001055	0.000768
898177	0.443043182	0.001897635	0	0	3.556956818	0.098102365	0.001055	0.000768
898177	0.511783075	0.002904908	2.73612E-06	0	3.488216925	0.097095092	0.001052264	0.000768
898177	0.596086415	0.003847949	2.19614E-05	0	3.403913585	0.096152051	0.001033039	0.000768
898177	0.70422927	0.004991988	3.41544E-05	0	3.29577073	0.095008012	0.001020846	0.000768
898177	0.736165548	0.005402772	4.34905E-05	0	3.263834452	0.094597228	0.00101151	0.000768
898177	0.863634039	0.006882534	4.92657E-05	0	3.136365961	0.093117466	0.001005734	0.000768
898177	0.961691531	0.007922809	8.34772E-05	0	3.038308469	0.092077191	0.000971523	0.000768
898177	1.694849679	0.02488041	0.000242074	0.000298399	2.305150321	0.07511959	0.000812926	0.000469601
898177	2.156329216	0.046290117	0.00048092	0.00018762	1.843670784	0.053709883	0.00057408	0.00058038
898177	2.28066375	0.058965709	0.000558424	0.000402402	1.71933625	0.041034291	0.000496576	0.000365598
898177	2.263872217	0.060281286	0.000637646	0.000436729	1.736127783	0.039718714	0.000417354	0.000331271
898177	2.280749095	0.059985812	0.000590469	0.000523726	1.719250905	0.040014188	0.000464531	0.000244274
898177	2.250775532	0.059775064	0.000604293	0.00049205	1.749224468	0.040224936	0.000450707	0.00027595
898177	2.248860841	0.059972783	0.000622058	0.000422935	1.751139159	0.040027217	0.000432942	0.000345065
898177	2.28310359	0.059096809	0.000547755	0.000441093	1.71689641	0.040903191	0.000507245	0.000326907
898177	2.324286746	0.058745232	0.000581056	0.000433505	1.675713254	0.041254768	0.000473944	0.000334495
898177	2.298780141	0.059541016	0.000586692	0.000465395	1.701219859	0.040458984	0.000468308	0.000302605]

%objective function that models system of interest. k-values 1 through 7 to be determined through fitting values for each c(n) to this model%
function C=kinetics_full(k,t)
        c0=[95.898177 0 0 0 0 4 0.1 0.001055 0.000768];
        [~,C]=ode45(@(t,c) DifEq(t,c,k),t,c0);
        
        
        
        function dC=DifEq(t,c,k)
    dcdt=zeros(9,1);
    dcdt(1)=-2*k(1).*c(1)+k(7).*(c(2)+c(3)+c(4)+c(5)+c(6)+c(7)+c(8)+c(9))+k(5).*(c(4)+c(8))+k(6).*(c(5)+c(9)); 
    dcdt(2)=1.1*k(1).*c(1)-k(2).*c(2)-k(7).*c(2); 
    dcdt(3)=k(2).*c(2)-k(3).*c(3)-k(7).*c(3); 
    dcdt(4)=k(3).*c(3)-k(4).*c(4)-k(5).*c(4)-k(7).*c(4);
    dcdt(5)=k(4).*c(4)-k(6).*c(5)-k(7).*c(5);
    dcdt(6)=0.9*k(1).*c(1)-k(2).*c(6)-k(7).*c(6); 
    dcdt(7)=k(2).*c(6)-k(3).*c(7)-k(7).*c(7);
    dcdt(8)=k(3).*c(7)-k(4).*c(8)-k(5).*c(8)-k(7).*c(8);
    dcdt(9)=k(4).*c(8)-k(6).*c(9)-k(7).*c(9);
    dcdt(1)=0;
       
            
            dC=dcdt;
           
            
        end
        
    end

%multistart fitting protocol%
problem = createOptimProblem ('lsqcurvefit','objective', @kinetics_full, 'x0', randn(7,1), ...
'lb', [0,0,0,0,0,0,0.03], 'ub', [10,10,10,10,10,10,0.07],'xdata',time_full,'ydata',label_full);
ms = MultiStart('PlotFcns',@gsplotbestf);
[xmulti,errormulti] = run(ms,problem,20);

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Star Strider 2019-11-12

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/490673-weighted-fit-with-lsqcurvefit-and-ideally-multistart#answer_401159

在 MATLAB Online 中打开

Igor_Moura_GA.m

I doubt that weighting is appropriate, and I would not recommend doing that.

I recognise that code excerpt, so I am attaching an updated version of that file using the ga (genetic algorithm) function to do the parameter estimation. It uses random initial values for the population, so I definitely advise running it several times, saving the parameter vector (‘theta’) and ending fitness value (‘fval’) for each run to determine the best fit among several runs. Likely the easiest way to do that would be to add an output to the function:

function [theta,fval] = Igor_Moura_GA

so you can then call it in a for loop, and write the ‘theta’ and ‘fval’ values to a file so you do not lose them if something should go wrong.

Also, it may be best to include the initial conditions in the parameter vector, rather than assuming that the hard-coded initial conditions are the best. Append the initial conditions parameters onto the end of the parameter vector, so in that original code, with 6 parameters and 4 initial conditions, the new parameter vector would have a length of 10 and the ‘kinetics’ function would then be written:

function C = kinetics(theta,t)
c0 = theta(7:10);
[T,Cv] = ode45(@DifEq,t,c0);
    function dC=DifEq(t,c)
    dcdt=zeros(4,1);
    dcdt(1)=-theta(1).*c(1)-theta(2).*c(1);
    dcdt(2)= theta(1).*c(1)+theta(4).*c(3)-theta(3).*c(2)-theta(5).*c(2);
    dcdt(3)= theta(2).*c(1)+theta(3).*c(2)-theta(4).*c(3)+theta(6).*c(4);
    dcdt(4)= theta(5).*c(2)-theta(6).*c(4);
    dC=dcdt;
    end
C=Cv;
end

Be sure to change the ‘Parms’ (parameter number) value to reflect the augmented parameter vector length. My code will make the appropriate adjustments.

3 个评论
显示 1更早的评论隐藏 1更早的评论

Star Strider 2019-11-15

As always, my pleasure!

I developed that file for my own interest a while ago to see how well ga could work to optimise those parameters.

I only use weighting if I am using data from publications that present summary data and distribution parameters, such as mean and standard deviation. I then use those means weighted by the inverse of the variance for each observation mean to provide representative significance to each point. Weighting can distort the data otherwise.

I would see how well ga fits those points first. (It is a ‘derivative-free’ optimiser.) The only reason I would weight them otherwise would depend on the experimental conditions. For example if it was easier to acquire the data in one region (so there are more of them) than in another region (so there are less of them). In that instance, weighting could compensate for the discrepancy in the data distribution, and provide a better fit. In that example, weighting them by the distance between the points (or some other appropriate metric) could correct for the discrepancy in the number of observations. (This is a purely theoretical consideration, since I have never had to do something similar myself.) Note that any least-squares metric will give disproportionate influence to data that are farther from the calculated regression line (greater error), so experimenting with different distance metrics could be worthwhile. This is straightforward in ga, since you will decide — and code — the fitness function to use. The ga fucntion doesn’t care, so long as the fitness fucntion returns a scalar fitness value. Feel free to experiment.

Matt Bilyard 2019-11-21

Cheers for the explanation, this makes sense.

On your point:

" Note that any least-squares metric will give disproportionate influence to data that are farther from the calculated regression line"

...this is, I think, exactly the issue I'm having (and what led me to consider weighting!). In terms of absolute amount, higher-magnitude data sets (e.g. c2) all have much greater error than lower-magnitude ones (e.g. c5). So, the solver gives disproportionate weight to curves involving e.g. c2.

I'm not really familiar with fitness functions - would you have any suggestions for alternatives that I could experiment with, should I need to? (I am trying out the original suggestion currently, it's somewhat sluggish for this amount of data so it'll be a while before I know!).

请先登录，再进行评论。

Answer 2

Matt J 2019-11-12

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/490673-weighted-fit-with-lsqcurvefit-and-ideally-multistart#answer_401155

编辑：Matt J 2019-11-12

在 MATLAB Online 中打开

Just apply whatever weights you want to the objective and to your ydata,

fun=@(x,xdata)kinetics_full(x,xdata).*sqrt(weights);
ydata=label_full.*sqrt(weights);
problem = createOptimProblem ('lsqcurvefit','objective', fun, 'x0', randn(7,1), ...
'lb', [0,0,0,0,0,0,0.03], 'ub', [10,10,10,10,10,10,0.07],'xdata',time_full,'ydata',ydata);

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Weighted fit with lsqcurvefit and (ideally) multistart?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

3 个评论
显示 1更早的评论隐藏 1更早的评论

更多回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Weighted fit with lsqcurvefit and (ideally) multistart?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

3 个评论 显示 1更早的评论隐藏 1更早的评论

更多回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论