Using fitcnb to train a naive bayes with some predictors that are dummy variables?

1 次查看(过去 30 天)
So I am trying to train my naive bayes and some of my predictors are definitely not normal distributed, for they are filled with 0's and 1's. But when I use 'mn' it does not work at all and when I use 'mvmn' for these predictors then I get the following message: "Warning: You specified the 'mvmn' distribution for at least one predictor that does not appear in the 'CategoricalPredictors' list. 'CategoricalPredictors' will be updated to include all 'mvmn' predictors. ".
So I am clearly not understanding the trainer completely and don't know how to incorporate the fact that some predictors are dummy variables.
  2 个评论
Don Mathis
Don Mathis 2020-4-2
'CategoricalPredictors' will be updated to include all 'mvmn' predictors. "
It sounds like it's working fine in that case. Binary dummy variables are categorical variables.

请先登录,再进行评论。

回答(1 个)

Purvaja
Purvaja 2025-8-26
Let’s break down what that warning really means:
When you tell MATLAB to treat some features as 'mvmn' (multivariate multinomial), it expects them to be categorical or discrete predictors. MATLAB keeps track of which predictors are categorical using the 'CategoricalPredictors' property.
In your case, some features weren’t marked as categorical, so they weren’t in 'CategoricalPredictors'. But since you forced 'mvmn' as the distribution for all predictors, MATLAB automatically adds any predictor you set as 'mvmn' to the categorical list.
Here’s a small example:
% First column is continuous (height), second is categorical (dummy)
X = [1.80 0;
1.65 1;
1.75 0;
1.55 1];
Y = categorical({'fit'; 'fit'; 'unfit'; 'fit'});
% Forcing all predictors to mvmn
distNames = {'mvmn','mvmn'};
Mdl = fitcnb(X, Y, 'DistributionNames', distNames);
This example will give Warning like it gave it for you. Forcing continuous predictors to be categorical like this can reduce accuracy.
A better approach is to tell MATLAB explicitly which predictors are categorical, especially for mixed datasets:
distNames = {'normal','mvmn'}; % height ~ normal, dummy ~ categorical
Mdl = fitcnb(X, Y, 'DistributionNames', distNames, 'CategoricalPredictors', 2);
Here, we explicitly mark the 2nd column as categorical. MATLAB will automatically treat it as discrete and apply 'mvmn', while other columns are assumed continuous.
Now, about 'mn' (multinomial): it does not auto-convert predictors to categorical because it’s meant for count data. If you want 'mn' to be applied correctly, you must explicitly mark predictors as categorical.
For more details, check these resources:
Hope this helps you!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by