Matrix condition in ordinal logistic regression
6 次查看(过去 30 天)
显示 更早的评论
I have two sets of data, one raw and one cleaned. The independent variable is damage, which can take any value between 0 and 100, inclusive. The dependent variable is a user rating, which can take any whole number value between 1 and 5, inclusive. Since the IV is continuous and the DV ordinal, I am using a logistic regression to predict future responses - specifically, I am using MATLAB's built-in mnrfit function, specifying the model type as ordinal.
This worked well for the raw data, but I am getting an error when attempting the regression on the cleaned data. Specifically,
In mnrfit>ordinalFit (line 349) In mnrfit (line 206) Warning: Matrix is singular, close to singular or badly scaled. Results may be inaccurate. RCOND = NaN. Error using linsolve Matrix must be positive definite.
Error in mnrfit (line 248) bcov = linsolve(hess,eye(size(hess)),struct('SYM',true,'POSDEF',true));"
There are not huge differences between the two datasets, simply a few more zeros in the cleaned responses. The condition number of the raw response matrix is about 57, and that of the cleaned response matrix about 58. The rank of both the raw response and the cleaned response matrix is 5, which is equal to the number of columns in each matrix. Therefore, I am not sure why the regression should work with the raw data but not the cleaned data.
Could someone suggest how to proceed with troubleshooting/debugging? Or why the regression might not be working? I've included the relevant code snippets at the end of this post.
Thank you in advance!
This is the code for the original logit model, based on the raw data:
% fit logit model to all data - responses as a function of mean CDF
% expect a positive trend - i.e. that responses get larger as mean CDF
% increases
% 1: predictor variable is mean CDF; response variable is the user
% damage rating - this is the way our experiment was designed
% get unique values of mean CDF
[x,ia,ic] = unique(crowddata(:,3));
y(1:length(x),1:5) = 0;
for i = 1:length(x)
y(i,1:5) = crowddata(ia(i),4:8);
beq(i,1) = buckets(ia(i),3);
buneq(i,1) = buckets(ia(i),4);
end
% get sample sizes (total number of responses) for each mean CDF
for i = 1:length(x)
ssize(i,1) = y(i,1) + y(i,2) + y(i,3) + y(i,4) + y(i,5);
end
% fit a ordinal logistic regression model
[b,dev,stat] = mnrfit(x,y,'model','ordinal');
b2 = [b(1:4)';repmat(b(5:end),1,4)];
xx = (1:1:276)';
% get probability that a mean CDF is in a category, based on our
% logit model that includes all user responses
pihat = mnrval(b,xx,'model','ordinal','interactions','off');
Below is the code for the regression model based on the cleaned data.
% new regression model using only responses cleaned based on time
% 2: predictor variable is mean CDF; response variable is the user
% damage rating - this is the way our experiment was designed
% get unique values of mean CDF
[x_clean,ia,ic] = unique(crowddata_clean(:,3));
y_clean(1:length(x_clean),1:5) = 0;
for i = 1:length(x_clean)
y_clean(i,1:5) = crowddata_clean(ia(i),4:8);
beq(i,1) = buckets(ia(i),3);
buneq(i,1) = buckets(ia(i),4);
end
% get sample sizes (total number of responses) for each mean CDF
for i = 1:length(x_clean)
ssize_clean(i,1) = y_clean(i,1) + y_clean(i,2) + y_clean(i,3) +...
y_clean(i,4) + y_clean(i,5);
end
% fit a ordinal logistic regression (linear) model
[bclean,dev,stat] = mnrfit(x_clean,y_clean,'model','ordinal');
b3 = [b(1:4)';repmat(b(5:end),1,4)];
0 个评论
回答(1 个)
Neil Guertin
2017-9-7
That warning comes from an mldivide (backslash) operation within mnrfit. Since the reciprocal condition number of that matrix is NaN, I believe you probably have a NaN in your response matrix, or else have hit some edge case that produces NaN before this operation (perhaps with 0/0).
It should not be possible to calculate the condition number of a matrix with a NaN, so check that you have calculated the condition number of the correct matrix (The final version of y, right before you call mnrfit). Examine both x and y to be sure that you are passing the correct data to mnrfit. You can check for NaN with
>> any(any(isnan(y)))
You may want to consider removing bad datapoints or outliers altogether instead of replacing values with 0 and including them in the regression.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Linear Regression 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!