fitglm for multi-dimensional/time series data

7 次查看(过去 30 天)
Hi everyone,
How can I use fitglm when my response variable has more than one dimension? Specifically, my response variable is time series data with dimensions n x p, where n is number of observations and p is time points. I would like to avoid having to loop over the time series and compute many fitglms on one time point at a time. Is this possible?
To be clear, the result should be a time series of coefficient estimates of the univariate GLM -- I am not trying to fit a multi-variate model. Also, I would prefer to use fitglm by providing it with a modelspec in Wilkinson notation, as in the example below.
%% Generate time series data of two experimental conditions.
clear; clc
npoints = 1000;
ntrials = 20;
t = (1:npoints)/1000;
signal = (1-cos(2*pi*t))/2;
data1 = repmat(1.0*signal, [ntrials, 1])+ 0.1*randn(ntrials, npoints);
data2 = repmat(0.6*signal, [ntrials, 1])+ 0.1*randn(ntrials, npoints);
data_all = [data1; data2];
cond = [ones(ntrials,1); 2*ones(ntrials,1)];
figure; plot(t, data1, 'r', t, data2, 'k')
%% The following works, but only for a single point of the time series.
modelspec = 'Var2 ~ cond';
tbl = table(cond, data_all(:,npoints/2));
mdl = fitglm(tbl,modelspec,'Distribution','normal')
%% Looping over data points is time consuming.
tic
for i = 1:npoints
tbl = table(cond, data_all(:,i));
mdl = fitglm(tbl,modelspec,'Distribution','normal');
coeffs(:,i) = mdl.Coefficients.Estimate;
end
toc
figure; plot(t, coeffs)
%% fitglm does not accept the full time series at once.
tbl = table(cond, data_all);
mdl = fitglm(tbl,modelspec,'Distribution','normal')
%% Different notation, but same problem.
mdl = fitglm(cond, data_all, 'Distribution','normal')

回答(1 个)

Jaynik
Jaynik 2024-7-24
Hi Niko,
The fitglm function in MATLAB is designed to work with univariate response variables. For multivariate response variable (like a time series), you would typically need to fit a separate model for each time point, as done in the loop.
One way to achieve faster computations is to use parallel computing tools provided by MATLAB, such as parfor instead of for. This allows you to perform the iterations in parallel, which can significantly speed up the computation if you have a multi-core processor.
Here is how the code can be modified:
if isempty(gcp('nocreate'))
parpool
end
tic
coeffs = zeros(2,npoints);
parfor i = 1:npoints
tbl = table(cond, data_all(:,i));
mdl = fitglm(tbl,modelspec,'Distribution','normal');
coeffs(:,i) = mdl.Coefficients.Estimate;
end
toc
Please note that we need to start a parallel pool using parpool before using parfor. Also, the Parallel Computing Toolbox is required for this.
Even though this approach speeds up the computation, it is still fitting a separate GLM for each time point. For large number of time points, consider a different modeling approach that can handle multivariate time series data directly. However, this would likely involve moving away from the GLM framework.
Hope this helps!

类别

Help CenterFile Exchange 中查找有关 Agriculture 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by