Using fitcnb to train a naive bayes with some predictors that are dummy variables?
1 次查看(过去 30 天)
显示 更早的评论
So I am trying to train my naive bayes and some of my predictors are definitely not normal distributed, for they are filled with 0's and 1's. But when I use 'mn' it does not work at all and when I use 'mvmn' for these predictors then I get the following message: "Warning: You specified the 'mvmn' distribution for at least one predictor that does not appear in the 'CategoricalPredictors' list. 'CategoricalPredictors' will be updated to include all 'mvmn' predictors. ".
So I am clearly not understanding the trainer completely and don't know how to incorporate the fact that some predictors are dummy variables.
2 个评论
Mahesh Taparia
2020-3-26
Hi
What is the dimension of your input data? Can you upload your sample code?
Don Mathis
2020-4-2
'CategoricalPredictors' will be updated to include all 'mvmn' predictors. "
It sounds like it's working fine in that case. Binary dummy variables are categorical variables.
回答(1 个)
Purvaja
2025-8-26
Let’s break down what that warning really means:
When you tell MATLAB to treat some features as 'mvmn' (multivariate multinomial), it expects them to be categorical or discrete predictors. MATLAB keeps track of which predictors are categorical using the 'CategoricalPredictors' property.
In your case, some features weren’t marked as categorical, so they weren’t in 'CategoricalPredictors'. But since you forced 'mvmn' as the distribution for all predictors, MATLAB automatically adds any predictor you set as 'mvmn' to the categorical list.
Here’s a small example:
% First column is continuous (height), second is categorical (dummy)
X = [1.80 0;
1.65 1;
1.75 0;
1.55 1];
Y = categorical({'fit'; 'fit'; 'unfit'; 'fit'});
% Forcing all predictors to mvmn
distNames = {'mvmn','mvmn'};
Mdl = fitcnb(X, Y, 'DistributionNames', distNames);
This example will give Warning like it gave it for you. Forcing continuous predictors to be categorical like this can reduce accuracy.
A better approach is to tell MATLAB explicitly which predictors are categorical, especially for mixed datasets:
distNames = {'normal','mvmn'}; % height ~ normal, dummy ~ categorical
Mdl = fitcnb(X, Y, 'DistributionNames', distNames, 'CategoricalPredictors', 2);
Here, we explicitly mark the 2nd column as categorical. MATLAB will automatically treat it as discrete and apply 'mvmn', while other columns are assumed continuous.
Now, about 'mn' (multinomial): it does not auto-convert predictors to categorical because it’s meant for count data. If you want 'mn' to be applied correctly, you must explicitly mark predictors as categorical.
For more details, check these resources:
ClassificationNaiveBayes: https://www.mathworks.com/help/stats/classificationnaivebayes.html
Hope this helps you!
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Naive Bayes 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!