Regression with several dummy variables

8 次查看(过去 30 天)
Maria
Maria 2014-8-17
编辑: dpb 2014-8-18
I have a cell type variable with 20000 rows and 700 columns. I present here an example of the first 9 columns:
C1 C2 C3 C4 C5 C6 C7 C8 C9
A={ 0 0 0 13 16 11 17 26 12 %row 1 is irrelevant
12 0 0 1 0 0 0 0 0
13 0 0 0 1 0 0 0 0
16 0 0 0 0 1 0 0 0
18 0 0 0 0 0 1 0 0
26 0 0 1 0 0 0 0 0
41 0 0 0 0 0 0 1 0}
I am trying to perform a regression.
C1 is simply and ID code; C2 is my binary dependent variable y. C3 is a dummy variable x (the elements, 0 or 1, are numbers), whose coefficient β (and if possible standard deviation) I want to interpret. From C4 onwards I have dummy variables (here the elements, 0 or 1, are logicals) that I also want to include in my regression to control for certain effects.
I most likely should use fitlm or regress functions but I am not being successful. Can someone help me? Thank you very much.
  2 个评论
dpb
dpb 2014-8-17
Sounds hopeless, nearly, with the number of variables, but guess you'll not know until actually try.
First, are the dummy variables coded to be independent? That is, are the large number having come from fewer variables but of different levels or are they actually all separate effects?
Maria
Maria 2014-8-17
The large numbers did come from fewer variables but of different levels.

请先登录,再进行评论。

回答(1 个)

dpb
dpb 2014-8-17
编辑:dpb 2014-8-17
Given the response to the previous question, should be just
y=A{1}(2:end,2); % y response variable
x=A{1}{2:end,3:end}; x=[ones(size(x,1),1 x]; % predictor variables plus constant term
[b,bint,~,~,stats] = regress(y,x);
As said, all will depend upon what the actual design matrix X'*X looks like when it's computed (actually not computed by Matlab, but the characteristics of same are what determines the covariances, estimabilities, etc., etc., etc., which are, of course all dependent upon the codings chosen being independent.)
  6 个评论
Maria
Maria 2014-8-18
It gives this error: Warning: X is rank deficient to within machine precision. > In regress at 84. Do you know why? Thanks
dpb
dpb 2014-8-18
编辑:dpb 2014-8-18
Yep...as suspected would be the case given the number of dummy variables, at least one column is the same as another. It'll be very difficult to find an encoding that won't lead to the problem I'd guess.
You can always try
rank(x)
to get an estimate of how many problems you have...
I repeat the final synopsis from my initial answer --
...all will depend upon what the actual design matrix X'*X looks like when it's computed (actually not computed by Matlab, but the characteristics of same are what determines the covariances, estimabilities, etc., etc., etc., which are, of course all dependent upon the codings chosen being independent.)
It's that last phrase about being independent that's the rub.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Descriptive Statistics 的更多信息

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by