why does 'fitmnr' produce wrong CoefficientNames?
8 次查看(过去 30 天)
显示 更早的评论
I have a 2649x4 table T looking like this
animalID Region f_n shiftDir
________ _______ _____ ________
{'1-3'} {'CA1'} {'f'} {'B'}
{'1-3'} {'CA1'} {'f'} {'N'}
{'1-3'} {'CA1'} {'f'} {'N'}
{'1-3'} {'CA1'} {'f'} {'N'}
{'1-3'} {'CA1'} {'f'} {'N'}
{'1-3'} {'CA1'} {'f'} {'N'}
{'1-3'} {'CA1'} {'f'} {'F'}
{'1-3'} {'CA1'} {'f'} {'B'}
{'1-3'} {'CA1'} {'f'} {'N'}
{'1-3'} {'CA1'} {'f'} {'B'}
...
All variables are categorical. AnimalID has 11 unique categories, Region has 2 unique categories (CA1 or CA3), f_n has 2 unique categories (f or n) and shiftDir has 3 categories ('B', 'F', 'N'). I wish to perform a multinomial logistic regression where 'shiftDir' is the response variable. When using the function fitmnr introduced last year, the coefficient names in the output are wrong, as well as the number of predictor.
For instance, if I do
mlr2 = fitmnr(T,'shiftDir ~ Region + f_n + animalID', CategoricalPredictors='all')
I end up with the following output:
mlr2 =
Multinomial regression with nominal responses
Value SE tStat pValue
_________ __________ ___________ __________
(Intercept_B) 1.2863 0.11896 10.812 3.0111e-27
animalID_cdc_B 0.5015 0.48301 1.0383 0.29913
animalID_cfc_B -0.15888 0.35949 -0.44197 0.65851
animalID_wt1_B -1.3369 0.36385 -3.6742 0.00023859
animalID_4-2_B -0.85465 5.5979e+06 -1.5267e-07 1
animalID_4-1_B -2.0389 5.5979e+06 -3.6423e-07 1
animalID_5-3_B -0.28241 5.5979e+06 -5.045e-08 1
animalID_5-1_B 0.58524 5.5979e+06 1.0455e-07 1
animalID_5-4_B 0.042137 5.5979e+06 7.5274e-09 1
animalID_7-2_B -0.73503 5.5979e+06 -1.3131e-07 1
animalID_9-1_B 0.93652 5.5979e+06 1.673e-07 1
Region_CA3_B -0.40496 5.5979e+06 -7.2342e-08 1
f_n_n_B 1.292 0.17441 7.4078 1.284e-13
(Intercept_F) 1.8246 0.11331 16.102 2.4575e-58
animalID_cdc_F 0.49841 0.47571 1.0477 0.29477
animalID_cfc_F -0.075916 0.35126 -0.21613 0.82889
animalID_wt1_F -0.25259 0.30712 -0.82246 0.41081
animalID_4-2_F 13.344 5.9316e+06 2.2496e-06 1
animalID_4-1_F 11.524 5.9316e+06 1.9427e-06 1
animalID_5-3_F 13.386 5.9316e+06 2.2567e-06 1
animalID_5-1_F 13.784 5.9316e+06 2.3238e-06 1
animalID_5-4_F 14.044 5.9316e+06 2.3676e-06 1
animalID_7-2_F 13.099 5.9316e+06 2.2083e-06 1
animalID_9-1_F 14.029 5.9316e+06 2.3651e-06 1
Region_CA3_F -13.119 5.9316e+06 -2.2117e-06 1
f_n_n_F 0.65204 0.16938 3.8495 0.00011834
2649 observations, 5272 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 223.1291, p-value = 3.2688e-34
I am not very familiar with multinomial logistic regressions, but I believe the predictors should actually be less numerous and of the form "animalID_B, Region_B, f_n_B, ... animalID_F, Region_F, f_n_F". I am not sure why it appends the names of the categories inside the predictor variables (e.g. the names of the animals), thus creating too many predictors.
Note that I also get some warning when running the regression:
Warning: Maximum likelihood estimation did not converge. Iteration limit
exceeded. You may need to merge categories to increase observed counts.
> In mnrfit>nominalFit (line 570)
In mnrfit (line 246)
In MultinomialRegression/fitter (line 317)
In classreg.regr/FitObject/doFit (line 94)
In MultinomialRegression.fit (line 672)
In fitmnr (line 121)
This message might be unrelated, because I don't get that error when I omit the animalID predictor variable, but I still get wrong CoefficientNames, and I suspect a wrong output altogehter.
I wonder if this is due to the data type inside my table. Any feedback would appreciated.
Thank you
0 个评论
回答(1 个)
Avadhoot
2024-4-10
From your question I see that you are using the "fitmnr" method with the "CategoricalPredictors='all'" input parameter. The issue with the names that you are facing is due to this argument. When you specify CategoricalPredictors='all', MATLAB treats each level of your categorical predictors as separate entities. This is why you see coefficients for each category of "animalID" (and other variables) for each level of your response variable "shiftDir" (except the reference category, which is implicitly set to 0). This is expected behavior for categorical variables in regression models, including multinomial logistic regression.
The coefficient names are also consistent with the MATLAB naming formant which is predictorName_levelName_responseLevel. This indicates how each level of a predictor influences the log-odds of being in a particular category of the response variable, relative to a reference category.
The warning about convergence that you are seeing is because the model is not fitting too well. This is because the number of parameters is too high. The warning disappears when you exclude "animalID" because it reduces the number of parameters considerably and thus the model is simpler and easier to fit.
A solution to your problem would be to try to simplify the model. Consider whether all predictors are necessary or if some can be omitted. Also consider reducing the number of levels in the categorical variables or combining categories if possible.
For more information on fitmnr function refer to the below documentation:
I hope this helps.
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!