Weighted fit of ode's to data?

5 次查看(过去 30 天)
Hi,
I have a system of ODEs that models a set of sequential biological reactions. I am trying to fit these to experimental data in order to extract approx. rate constants. To date, I have a script that can do this (see below), but the fit is poor for some cases (where the absolute magnitudes of the data are far lower). I've also tried multistart approaches, without success.
It was suggested to me that adding weights to the fit (i.e. weighted regression) might help. I was wondering how, in general, I might go about doing this in Matlab, based on the below? Disclaimer: I'm not a mathematician (chemist), and I'm a total Matlab amateur, so if this is a poor idea I am also happy to hear this!
Many thanks,
Matt
function cytosinefitcell
function C=kinetics(k,t)
c0=[95.8989;0;0;0;0];
[T,Cv]=ode45(@DifEq,t,c0);
function dC=DifEq(t,c)
dcdt=zeros(5,1);
dcdt(1)= -k(1).*c(1)+k(5).*c(4)+k(6).*c(5)+k(7).*(c(2)+c(3)+c(4)+c(5));
dcdt(2)= k(1).*c(1)-k(2).*c(2)-k(7).*c(2);
dcdt(3)= k(2).*c(2)-k(3).*c(3)-k(7).*c(3);
dcdt(4)= k(3).*c(3)-k(4).*c(4)-k(5).*c(4)-k(7).*c(4);
dcdt(5)= k(4).*c(4)-k(6).*c(5)-k(7).*c(5);
dcdt(1) = 0;
dC=dcdt;
%dcdt(1) = 0 as by definition this variable is constant, i.e. steady state.
end
C=Cv;
end
%data for t
t=[0
1
2
3
4
5
6
7
8
9
10
24
48
96
144
192
240
288
336
384
432];
%data for c(1) to c(5)
c=[95.8989 0 0 0 0
95.8989 0.116654171 0.000169089 0 0
95.8989 0.221311714 0.000215784 0 0
95.8989 0.361815956 0.001337332 0 0
95.8989 0.443043182 0.001897635 0 0
95.8989 0.511783075 0.002904908 2.59348E-06 0
95.8989 0.596086415 0.003847949 2.08165E-05 0
95.8989 0.70422927 0.004991988 3.23739E-05 0
95.8989 0.736165548 0.005402772 4.12232E-05 0
95.8989 0.863634039 0.006882534 4.66973E-05 0
95.8989 0.961691531 0.007922809 7.91253E-05 0
95.8989 1.694849679 0.02488041 0.000229454 3.88541E-05
95.8989 2.156329216 0.046290117 0.000455848 2.44297E-05
95.8989 2.28066375 0.058965709 0.000529312 5.23961E-05
95.8989 2.263872217 0.060281286 0.000604404 5.68658E-05
95.8989 2.280749095 0.059985812 0.000559687 6.81934E-05
95.8989 2.250775532 0.059775064 0.00057279 6.40691E-05
95.8989 2.248860841 0.059972783 0.000589628 5.50697E-05
95.8989 2.28310359 0.059096809 0.000519199 5.7434E-05
95.8989 2.324286746 0.058745232 0.000550764 5.6446E-05
95.8989 2.298780141 0.059541016 0.000556106 6.05984E-05];
%initial guesses for k - arbitrarily set to 0 except for k(7) which has
%known range. Each "k" is really a "k_obs".
k0=[0;0;0;0;0;0;0.04];
[k,Rsdnrm,Rsd,ExFlg,OptmInfo,Lmda,Jmat]=lsqcurvefit(@kinetics,k0,t,c,[0.0001;0.0001;0.0001;0.0001;0.0001;0.0001;0.04],[100;100;100;100;100;100;0.07]);
%upper and lower bounds are arbitrary to avoid 0 (or negative) values or
%unrealistically high values. Exception is k(7), whose range is known.
%plotting graph
fprintf(1,'\tRate Constants:\n')
for k1 = 1:7
fprintf(1, '\t\tk(%d) = %8.5f\n', k1, k(k1))
end
tv = linspace(min(t), max(t));
Cfit = kinetics(k, tv);
figure(1)
plot(t, c, 'p')
hold on
hlp = plot(tv, Cfit);
hold off
grid
xlabel('Time')
ylabel('Concentration')
legend(hlp, 'mC', 'hmC', 'fC','caC', 'Location','NE')
end

采纳的回答

Star Strider
Star Strider 2019-7-12
Unless you have a good reason to weightcertain varriables, (for example you are using values from the literature that are given in means and stardard deviations or standard errors, so you would use inverse-variance weighting, for example), I would not weight them.
This uses your variables without any weighting. I used ga as the optimisation function with your data, and added the initial conditions vector ‘x0’ as the last five elements of the parameter vector. It converged in 323 generations and estimated these parameters (where the parameters are the ‘theta’ vector) in 04:56:27.933. My delay in responding is that this took nearly 5 hours to converge (such as it is).
The estimated parameters:
Rate Constants:
Theta(1) = 0.76200
Theta(2) = 21.81800
Theta(3) = 13.67813
Theta(4) = 25.60900
Theta(5) = 36.61325
Theta(6) = 15.24400
Theta(7) = 31.94681
The estimated initial conditions:
Theta(8) = 96.02097
Theta(9) = 0.01250
Theta(10) = 0.00425
Theta(11) = 0.04367
Theta(12) = 0.00500
The fitness value (norm of the residuals) is 3.8365.
It fit two of the variables relatively well, however it did not fit three of them at all well:
Weighted fit of ode's to data - 2019 07 12.png
The essential ga call I used is:
ftns = @(theta) norm(c-kinetics(theta,t));
PopSz = 500;
Parms = 12;
opts = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',2E3, 'PlotFcn',@gaplotbestf, 'PlotInterval',1);
[theta,fval,exitflag,output] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],opts)
I always estimate the initial conditions along with the other parameters, so the only change I made to your ‘kinetics’ ODE function was to replace the ‘c0’ you coded with:
c0 = k(8:12);
I do not know what system you are modeling, so I cannot check if you coded your system of differential equations correctly. (I would expect a much better optimisation result.)
Please check your ‘kinetics’ system. If it is coded correctly, this result is the best we can likely hope for, so I will not re-run this. If it needs some ‘tweaking’, I would be interested in doing this again with the ‘tweaked’ system.
  2 个评论
Matt Bilyard
Matt Bilyard 2019-7-15
Hi - thanks, this is really helpful. I don't think there is a good reason to weight the variables, though I will need to check this with my colleague who actually collected the data.
I will re-check the system of equations itself, though I believe the model is in itself fine (i.e. I don't think there are any errors). However, it is definitely possible that the data may not exactly conform to this model; this is in part what we are trying to test. So, if it persistently fails to fit, then I would probably conclude this and look into what alterations to the model might be needed (and why).
The data are essentially mass spectrometry measurements of metabolite levels for a sequential reaction of metabolites: 1 --> 2 --> 3 --> 4 --> 5. The equations correspond to each of these 4 reactions, plus 2 additional reactions for re-conversion of 4 or 5 back to 1, plus 1 further reaction that models in cell division (which essentially converts all of 2-5 back to 1). This is almost certainly not clear; it's something of a complex system, I'm happy to try and provide more details when I have more time if helpful!
Many thanks,
Matt
Star Strider
Star Strider 2019-7-15
As always, my pleasure!
It is important that the model reflects the actual system. I would be interested in more details if you care to provide them, since I have some relevant experience in this sort of modeling, although likely not specific expertise in the system you are studying. I cannot promise that I would be able to add anything significant, however I would be able to experiment with the model. (I also do not understand the reason ‘c(:,1)’ is constant.) If you have a .pdf file or reference (preferably open-source) to a paper describing this system, I would be interested in reading it.
I have generally found ‘derivative free’ optimizations, such as ga, to be better than gradient-descent algorithms in problems such as you present here, so I would use it.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Chemistry 的更多信息

产品


版本

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by