Why does fitlm say my "design matrix is rank deficient to within machine precision" despite my design matrix being full rank?

9 次查看(过去 30 天)
Here's some code to generate a design matrix similar to my own (but with made up data):
% Size of the one-hot encoding part of the design matrix
numRows = 1200;
numCols = 15;
% Create a [1200 * 15] matrix of zeros
designMatrix = zeros(numRows, numCols);
% Define the number of 1s needed in each column
onesPerColumn = numRows / numCols; % Should be 80 in this case
% Create a vector with the column indices repeated the correct number of times
columnIndices = repmat(1:numCols, 1, onesPerColumn);
% Shuffle the column indices to randomize the placement of 1s
randomizedIndices = columnIndices(randperm(length(columnIndices)));
% Assign 1s to the designMatrix based on the randomized indices
for i = 1:numRows
designMatrix(i, randomizedIndices(i)) = 1;
end
% Tack on last two columns to designMatrix to finalize design matrix
designMatrix = [designMatrix,rand(numRows,2)];
My [1200 * 17] design matrix is arranged as follows: columns 1:15 are a one-hot encoding of stimulus identity and columns 16:17 are two predictors that are continuous. Specifically, each row of my one-hot encoding (columns 1:15) contains a single 1 with the other 14 columns being 0s. Each stimulus is presented the same number of times across all observations such that
sum(designMatrix(:,1:15),1)
will be a [1*15] vector of 80 (because 1200/15=80).
Despite assessing that my designMatrix is full rank with the following:
rank(designMatrix)
I receive Warning: Regression design matrix is rank deficient to within machine precision when I run fitlm, such as:
% Make up values for dependent variable
y = rand(numRows,1);
% Fit the model using fitlm
mdl = fitlm(designMatrix,y)
Interestingly, fitrlinear is able to run everything without this warning...but I don't fully understand the difference between fitrlinear and fitlm.
Another interesting thing is that mdl = fitlm(designMatrix(:,2:end),y) doesn't produce the Warning. This make me think there's an issue with constructing one-hot encoding of stimulus identity this way. However, I'm unaware of a better alternative since I want a coefficient for each stimulus in the end.
What am I overlooking? Is there an issue with my one-hot encoding? Is there another way to run fitlm that is ideal? Thank you!

回答(1 个)

John D'Errico
John D'Errico 2024-9-4
编辑:John D'Errico 2024-9-4
Funny. I was sure within a second of reading your question what you had done wrong. And I think you will probably say, Oh. Yeah. That makes sense. But what you needed to know was a little quirk about fitlm.
numRows = 1200;
numCols = 15;
% Create a [1200 * 15] matrix of zeros
designMatrix = zeros(numRows, numCols);
% Define the number of 1s needed in each column
onesPerColumn = numRows / numCols; % Should be 80 in this case
% Create a vector with the column indices repeated the correct number of times
columnIndices = repmat(1:numCols, 1, onesPerColumn);
% Shuffle the column indices to randomize the placement of 1s
randomizedIndices = columnIndices(randperm(length(columnIndices)));
% Assign 1s to the designMatrix based on the randomized indices
for i = 1:numRows
designMatrix(i, randomizedIndices(i)) = 1;
end
% Tack on last two columns to designMatrix to finalize design matrix
designMatrix = [designMatrix,rand(numRows,2)];
rank(designMatrix)
ans = 17
cond(designMatrix)
ans = 8.4793
So not only is your design martrix full rank, it is well conditioned too! Where could the problem lie?
If you read the help for fitlm, you will see this:
'Intercept' true (default) to include a constant term in the
model, or false to omit it.
And therein lies your problem. fitlm automatically adds in a constant term in the model! It appends a column of ones to your matrix.
rank([ones(1200,1),designMatrix])
ans = 17
As you can see, the matrix now is singular, with still a rank unchanged at 17.
cond([ones(numRows,1),designMatrix])
ans = 7.6044e+15
Yep. The matrix is now numerically singular. You need to tell fitlm to not do that.
y = rand(numRows,1);
mdl = fitlm(designMatrix,y,intercept = false)
mdl =
Linear regression model: y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 + x14 + x15 + x16 + x17 Estimated Coefficients: Estimate SE tStat pValue __________ ________ ________ __________ x1 0.54303 0.036075 15.053 5.5436e-47 x2 0.51032 0.036564 13.957 4.2721e-41 x3 0.51819 0.037102 13.967 3.7978e-41 x4 0.51079 0.036932 13.831 1.9476e-40 x5 0.54632 0.035861 15.234 5.4719e-48 x6 0.51591 0.036295 14.214 1.8869e-42 x7 0.57824 0.037559 15.395 6.9456e-49 x8 0.57388 0.036946 15.533 1.1742e-49 x9 0.54368 0.036334 14.964 1.7142e-46 x10 0.55035 0.036771 14.967 1.6469e-46 x11 0.54485 0.036636 14.872 5.4341e-46 x12 0.54347 0.036828 14.757 2.3026e-45 x13 0.47718 0.037328 12.783 3.8148e-35 x14 0.47759 0.036839 12.964 4.8979e-36 x15 0.50361 0.036978 13.619 2.4112e-39 x16 -0.024811 0.027533 -0.90113 0.3677 x17 -0.0094533 0.028365 -0.33327 0.73899 Number of observations: 1200, Error degrees of freedom: 1183 Root Mean Squared Error: 0.278
As far as why the matrix is always singular after appending the vector of ones, that stems from the way you designed this as a balanced experiment. Thus if you do this:
sum(designMatrix(:,1:15),2)
ans = 1200x1
1 1 1 1 1 1 1 1 1 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
you will get a vector of all ones. And that insures fitlm will have a hissy fit, UNLESS you tell it not to append a constant term to the model. (Personally, I always felt that was the wrong design decision to have made, but I did not write the code or the specs for fitlm. I'm sure they had valid reasons.)
Oh, your last questions ...
1. Why does fitrlinear run when fitlm has a problem? fitrlinear does NOT decide to append a constant term to the model by default!
2. Why does dropping the first column of the design matrix allow it to run with no hissy fit? Because now that column of ones (that will be appended by default) is no longer representable as a simple sum of the columns of your matrix.
isequal(ones(numRows,1),sum(designMatrix(:,1:15),2))
ans = logical
1
isequal(ones(numRows,1),sum(designMatrix(:,2:15),2))
ans = logical
0

类别

Help CenterFile Exchange 中查找有关 Support Vector Machine Regression 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by