how to perform ols regression using combinations of independent variable?
5 次查看(过去 30 天)
显示 更早的评论
Hi!
I have been struggling for a while with the following problem.
Suppose we have y as a dependent variable and x1,...,xn as exogenous variables (n>7).
What I want to do is try to see which combination of exogenous variables gives best fit for y ...
So, if we have, for example, 3 exogenous variables, I would like to see which of the following regressions is best for fitting y (assuming that I know what statistic I will be using to discriminate between a "good" model from a "bad one"):
y~x1 ;
y~x2 ;
y~x3 ;
y~x1+x2 ;
y~x1+x3 ;
y~x2+x3 ;
y~x1+x2+x3
For only 3 variables, it is not that complicated (2^3-1 possibilities). The problem appears when I begin introducing more and more exogenous variables (2^7-1 = 127). How can I do it (somehow automatically) for all combinations when number of exogenous is big (>7)?
Thanks for your help!
Cheers!
0 个评论
回答(3 个)
Image Analyst
2014-11-29
Why not just use all of them and let the regression figure out how to weight the different xn?
y = alpha0 + alpha1 * x1 + alpha2 * x2 + alpha3 * x3
You can't use polyfit() but you can use the standard least squares formula
alpha = inv(x' * x) * x' * y; % Get estimate of the alphas.
Where x = an N rows by 4 columns matrix.
1, x1(1), x2(1), x3(1)
1, x1(2), x2(2), x3(2)
1, x1(3), x2(3), x3(3)
1, x1(4), x2(4), x3(4)
...
1, x1(N), x2(N), x3(N)
If one of the xn is not a good predictor, it should have a small alpha weight.
Star Strider
2014-11-29
You are describing a stepwise multiple linear regression. It is a well-known, established technique, and the statistical procedure for adding and removing variables to get the best fit is not trivial.
If you have the Statistics Toolbox, see the documentation for Stepwise Regression and specifically stepwiselm, stepwise, and stepwisefit.
With 127 variables, and especially if you have a large data set, it is going to take some time. Have something else to do for a few minutes while the regression runs.
0 个评论
Matt J
2014-11-29
编辑:Matt J
2014-11-29
As ImageAnalyst says, performing an OLS regression with the entire data set should give you the unique best regression in one step, unless your x1,...,xn are over-complete.
If they are over-complete, and you are looking for the sparsest solution, the Matching Pursuit algorithm seems to be the standard alternative to an exhaustive search. There are several implementations on the File Exchange, but I've never used any of them:
Also, the solution is not guaranteed to be globally sparsest - the price paid for not doing an exhaustive search, it seems.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Linear and Nonlinear Regression 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!