how to perform ols regression using combinations of independent variable?

5 次查看(过去 30 天)
Hi!
I have been struggling for a while with the following problem.
Suppose we have y as a dependent variable and x1,...,xn as exogenous variables (n>7).
What I want to do is try to see which combination of exogenous variables gives best fit for y ...
So, if we have, for example, 3 exogenous variables, I would like to see which of the following regressions is best for fitting y (assuming that I know what statistic I will be using to discriminate between a "good" model from a "bad one"):
y~x1 ;
y~x2 ;
y~x3 ;
y~x1+x2 ;
y~x1+x3 ;
y~x2+x3 ;
y~x1+x2+x3
For only 3 variables, it is not that complicated (2^3-1 possibilities). The problem appears when I begin introducing more and more exogenous variables (2^7-1 = 127). How can I do it (somehow automatically) for all combinations when number of exogenous is big (>7)?
Thanks for your help!
Cheers!

回答(3 个)

Image Analyst
Image Analyst 2014-11-29
Why not just use all of them and let the regression figure out how to weight the different xn?
y = alpha0 + alpha1 * x1 + alpha2 * x2 + alpha3 * x3
You can't use polyfit() but you can use the standard least squares formula
alpha = inv(x' * x) * x' * y; % Get estimate of the alphas.
Where x = an N rows by 4 columns matrix.
1, x1(1), x2(1), x3(1)
1, x1(2), x2(2), x3(2)
1, x1(3), x2(3), x3(3)
1, x1(4), x2(4), x3(4)
...
1, x1(N), x2(N), x3(N)
If one of the xn is not a good predictor, it should have a small alpha weight.
  1 个评论
Matt J
Matt J 2014-11-29
编辑:Matt J 2014-11-29
You can't use polyfit() but you can use the standard least squares formula
No, don't do that. Just do
alpha=x\y;
for better conditioning. However, I assume that the OP's case is really more complicated, and that the x matrix does not have full column rank.

请先登录,再进行评论。


Star Strider
Star Strider 2014-11-29
You are describing a stepwise multiple linear regression. It is a well-known, established technique, and the statistical procedure for adding and removing variables to get the best fit is not trivial.
If you have the Statistics Toolbox, see the documentation for Stepwise Regression and specifically stepwiselm, stepwise, and stepwisefit.
With 127 variables, and especially if you have a large data set, it is going to take some time. Have something else to do for a few minutes while the regression runs.

Matt J
Matt J 2014-11-29
编辑:Matt J 2014-11-29
As ImageAnalyst says, performing an OLS regression with the entire data set should give you the unique best regression in one step, unless your x1,...,xn are over-complete.
If they are over-complete, and you are looking for the sparsest solution, the Matching Pursuit algorithm seems to be the standard alternative to an exhaustive search. There are several implementations on the File Exchange, but I've never used any of them:
Also, the solution is not guaranteed to be globally sparsest - the price paid for not doing an exhaustive search, it seems.

类别

Help CenterFile Exchange 中查找有关 Linear and Nonlinear Regression 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by