Vectorizing a repeated regression
显示 更早的评论
Hi All,
My apologies. Yet another question on how to vectorize a calculation. What I am trying to do, is recreate a script I wrote 15 years (!) ago in IDL, in Matlab, full description of the why and how here https://doi.org/10.1080/01431160500497119.
In short: I am bootstrapping a stepwise regression model, selecting the best predictor on statistics within the bootstrap. The current script results in the sort of output I am looking for. However, running it takes over an hour for just one response factor & one dataset. Taking that I have 10 response factors, and several datasets in this one study alone, I need this to become a lot quicker so it can become part of my standard data processing.
I know vectorizing is the way to go, but being a newbe, I cannot get my head around how to handle this. Someone willing to lend me a hand? I think that the actual regression is the one to be optimized, vectorizing for the "band" loop (Is that how you would refer to it?):
model = fitglm(predictors(train, [selectedbands,band]), response(train));
stats(thispredictor,band) = model.Rsquared.Ordinary;
I am not sure whether it is possible though: There are about 2000 wavebands in there. In other words: for each bootstrap iteration, 2k regressions are run. And I am running 10k iterations for each predictor.
The process it goes through (Full function in the attached file):
% Create the regression model one predictor at a time,
% as I want to bootstrap the sensitivity of each predictor
for thispredictor = 1:maxpredictors
% Bootstrap loop
for iteration = 1:iterations
% create the statistics for each iteration
stats = zeros(maxpredictors,numbands);
outdata = zeros(numbands);
% create a subset for training and testing
train = datasample(datarange, numsamples);
% Correlation at each waveband
for band = 1:numbands
% Set the dataset
predictorset = predictors(train, [selectedbands,band]);
responseset = response(train);
model = fitglm(predictors(train, [selectedbands,band]), response(train));
stats(thispredictor,band) = model.Rsquared.Ordinary;
end %band loop
% Select the best band for the model
[maxrsq,maxband] = max(stats(thispredictor,:));
bandcount(thispredictor,maxband) = bandcount(thispredictor,maxband)+1;
end %iterations loop
[frequency, peakband] = max(bandcount(thispredictor,:));
selectedbands = [selectedbands,peakband];
end %predictors loop
采纳的回答
更多回答(0 个)
类别
在 帮助中心 和 File Exchange 中查找有关 Descriptive Statistics 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!