stepwiselm does not respect 'Upper' 'linear' limit during multiple iterations

4 次查看(过去 30 天)
James Craig
James Craig2021-10-17
评论: James Craig ,2021-10-26
I am trying to run a regression analysis for a public company by trying to figure out which variables are important to determine the overall sales (using stepwiselm for this), but I don't want interaction terms. To test time lags on the different factors, I take the original raw data and then run multiple calls to stepwiselm with various time lags for each of the factors (the data table is generated in another function and result is stored in variable tab). My ultimate goal is to find the regression equation with the highest adjusted R2.
What I noticed is that when stepwiselm is called multiple times (for example in excess of 400 runs) in succession, it ends up bringing in interaction terms in the final regression equation. This is the call to stepwiselm in my for loop. (Note that I get the same result whether I use a "for" or a "parfor" loop for exection.)
I compile the results from each stepwiselm iteration in a cell array called models (first column holds the model, the third column has the equation). This is one of the results which contains the interaction terms:
>> models{445,1}
ans =
Linear regression model:
HomeSales_4 ~ [Linear formula with 6 terms in 3 predictors]
Estimated Coefficients:
Estimate SE tStat pValue
__________ __________ _______ _________
(Intercept) 480.89 135.42 3.5511 0.0007463
HousingStartsTotal_4 -0.0035197 0.0013978 -2.5179 0.014445
NewHomeOrders_4 -0.70127 0.32902 -2.1314 0.037094
HomeBacklog -0.1529 0.083674 -1.8273 0.07255
HousingStartsTotal_4:NewHomeOrders_4 8.9666e-06 3.2585e-06 2.7518 0.0077939
HousingStartsTotal_4:HomeBacklog 2.0849e-06 8.3874e-07 2.4858 0.015682
Number of observations: 67, Error degrees of freedom: 61
Root Mean Squared Error: 41.8
R-squared: 0.652, Adjusted R-Squared: 0.624
F-statistic vs. constant model: 22.9, p-value = 7.53e-13
>> models{445,3}
ans =
"HomeSales_4 ~ 1 + HousingStartsTotal_4*NewHomeOrders_4 + HousingStartsTotal_4*HomeBacklog"
However, if I run the same regression manually (i.e. just one iteration with the same input X and y), stepwiselm does not generate the interaction terms.
>> mdl=stepwiselm(tab2,'Upper','linear','Verbose',0)
mdl =
Linear regression model:
HomeSales_4 ~ 1 + HousingStartsTotal_4 + NewHomeOrders_4 + HomeBacklog
Estimated Coefficients:
Estimate SE tStat pValue
_________ __________ _______ __________
(Intercept) -82.309 43.518 -1.8914 0.06317
HousingStartsTotal_4 0.0023333 0.00040806 5.718 3.1831e-07
NewHomeOrders_4 0.16748 0.061989 2.7018 0.0088514
HomeBacklog 0.048105 0.014275 3.3698 0.0012884
Number of observations: 67, Error degrees of freedom: 63
Root Mean Squared Error: 47.1
R-squared: 0.544, Adjusted R-Squared: 0.522
F-statistic vs. constant model: 25, p-value = 8.93e-11
I am at a loss as to what is going on. I tried manually defining the equation in the Wilkinson format (instead of using 'Upper', 'linear'), but I still end up with the same results. I appreciate any inputs you may have in to the matter.
  7 个评论


回答(0 个)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by