Can I vectorize fitglm to process many regression models instead of using a for loop
2 次查看(过去 30 天)
显示 更早的评论
I am trying to process over a million logistic regressions through fitglm, and I am currently using a for loop to do this, which is taking a very long time.
I have a situation where I have 38 explanatory variables in my data set but only want to fit my model on 6 variables at a time, and therefore would like to process a fitted model for all possible combinations of 6 variables to be chosen from the 38 total (works out to 38 choose 6, which is around 2.7 million models).
I am curious if there is a way to vectorize the fitglm function to avoid using a lengthy for loop and iterating over all possible model combinations.
Here is my code (resp contains my responses, allExplanatoryVars contains all observations (rows) for all 38 explanatory variables (columns), variableChoices is a matrix that holds references for all possible subsets of size 6 out of the 38 total. For example, row 1 contains the values 1,2,3,4,5,6. Row 2 contains the values 1,2,3,4,5,7. And the final row of the matrix contains the values 33,34,35,36,37,38. When called, it is retrieving those specific columns within the allExplanatoryVars matrix, for each i).
% Fit the logistic regression models
for i = 1:N
subSelection = allExplanatoryVars(:, variableChoices(i,:));
mdlLogistic = fitglm(subSelection,resp,'Distribution','binomial','Link','logit');
%Store the coefficients in a results matrix for each i.
modelCoefficients(i,1) = mdlLogistic.Coefficients.Estimate;
end
Is it possible to perform this task in a quicker way than looping through all i iterations? For example, can you use a vectorized approach on subSelection and resp as they are used within the fitglm function?
Thank you so much for anyone's help on this!
回答(1 个)
Anagha Mittal
2024-9-11
Hi,
Unfortunately, vectorizing "fitglm" directly for such a large number of model fits isn't feasible because "fitglm" is inherently iterative and must fit each model separately.
However, to perform this task in a quicker way you may use "parfor" and "parpool" to enable parallel computation instead of using "for" loop. Below is an example:
N = size(variableChoices, 1);
modelCoefficients = zeros(N, 7); % Assuming 7 coefficients (including intercept) per model (change as needed)
parpool('local');
% Fit models in parallel
parfor i = 1:N
subSelection = allExplanatoryVars(:, variableChoices(i,:));
mdlLogistic = fitglm(subSelection, resp, 'Distribution', 'binomial', 'Link', 'logit');
% Store the coefficients (you can store other necessary statistics as needed)
modelCoefficients(i, :) = mdlLogistic.Coefficients.Estimate;
end
delete(gcp('nocreate'));
For more information on "parfor" and "parpool", refer to the following documentation link:
- https://www.mathworks.com/help/matlab/ref/parfor.html
- https://www.mathworks.com/help/parallel-computing/parpool.html
Hope this helps!
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Hypothesis Tests 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!