# ClassificationNaiveBayes

Naive Bayes classification for multiclass classification

## Description

ClassificationNaiveBayes is a Naive Bayes classifier for multiclass learning. Trained ClassificationNaiveBayes classifiers store the training data, parameter values, data distribution, and prior probabilities. Use these classifiers to perform tasks such as estimating resubstitution predictions (see resubPredict) and predicting labels or posterior probabilities for new data (see predict).

## Creation

Create a ClassificationNaiveBayes object by using fitcnb.

## Properties

expand all

### Predictor Properties

Predictor names, specified as a cell array of character vectors. The order of the elements in PredictorNames corresponds to the order in which the predictor names appear in the training data X.

Expanded predictor names, specified as a cell array of character vectors.

If the model uses dummy variable encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames.

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values corresponding to the columns of predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

Multivariate multinomial levels, specified as a cell array. The length of CategoricalLevels is equal to the number of predictors (size(X,2)).

The cells of CategoricalLevels correspond to predictors that you specify as 'mvmn' during training, that is, they have a multivariate multinomial distribution. Cells that do not correspond to a multivariate multinomial distribution are empty ([]).

If predictor j is multivariate multinomial, then CategoricalLevels{j} is a list of all distinct values of predictor j in the sample. NaNs are removed from unique(X(:,j)).

Unstandardized predictors used to train the naive Bayes classifier, specified as a numeric matrix. Each row of X corresponds to one observation, and each column corresponds to one variable. The software excludes observations containing at least one missing value, and removes corresponding elements from Y.

### Predictor Distribution Properties

Predictor distributions, specified as a character vector or cell array of character vectors. fitcnb uses the predictor distributions to model the predictors. This table lists the available distributions.

ValueDescription
'kernel'Kernel smoothing density estimate
'mn'Multinomial distribution. If you specify mn, then all features are components of a multinomial distribution. Therefore, you cannot include 'mn' as an element of a string array or a cell array of character vectors. For details, see Estimated Probability for Multinomial Distribution.
'mvmn'Multivariate multinomial distribution. For details, see Estimated Probability for Multivariate Multinomial Distribution.
'normal'Normal (Gaussian) distribution

If DistributionNames is a 1-by-P cell array of character vectors, then fitcnb models the feature j using the distribution in element j of the cell array.

Example: 'mn'

Example: {'kernel','normal','kernel'}

Data Types: char | string | cell

Distribution parameter estimates, specified as a cell array. DistributionParameters is a K-by-D cell array, where cell (k,d) contains the distribution parameter estimates for instances of predictor d in class k. The order of the rows corresponds to the order of the classes in the property ClassNames, and the order of the predictors corresponds to the order of the columns of X.

If class k has no observations for predictor j, then the Distribution{k,j} is empty ([]).

The elements of DistributionParameters depend on the distributions of the predictors. This table describes the values in DistributionParameters{k,j}.

Distribution of Predictor jValue of Cell Array for Predictor j and Class k
kernelA KernelDistribution model. Display properties using cell indexing and dot notation. For example, to display the estimated bandwidth of the kernel density for predictor 2 in the third class, use Mdl.DistributionParameters{3,2}.BandWidth.
mnA scalar representing the probability that token j appears in class k. For details, see Estimated Probability for Multinomial Distribution.
mvmnA numeric vector containing the probabilities for each possible level of predictor j in class k. The software orders the probabilities by the sorted order of all unique levels of predictor j (stored in the property CategoricalLevels). For more details, see Estimated Probability for Multivariate Multinomial Distribution.
normalA 2-by-1 numeric vector. The first element is the sample mean and the second element is the sample standard deviation.

Kernel smoother type, specified as the name of a kernel or a cell array of kernel names. The length of Kernel is equal to the number of predictors (size(X,2)). Kernel{j} corresponds to predictor j and contains a character vector describing the type of kernel smoother. If a cell is empty ([]), then fitcnb did not fit a kernel distribution to the corresponding predictor.

This table describes the supported kernel smoother types. I{u} denotes the indicator function.

ValueKernelFormula
'box'Box (uniform)

$f\left(x\right)=0.5I\left\{|x|\le 1\right\}$

'epanechnikov'Epanechnikov

$f\left(x\right)=0.75\left(1-{x}^{2}\right)I\left\{|x|\le 1\right\}$

'normal'Gaussian

$f\left(x\right)=\frac{1}{\sqrt{2\pi }}\mathrm{exp}\left(-0.5{x}^{2}\right)$

'triangle'Triangular

$f\left(x\right)=\left(1-|x|\right)I\left\{|x|\le 1\right\}$

Example: 'box'

Example: {'epanechnikov','normal'}

Data Types: char | string | cell

Kernel smoother density support, specified as a cell array. The length of Support is equal to the number of predictors (size(X,2)). The cells represent the regions to which fitcnb applies the kernel density. If a cell is empty ([]), then fitcnb did not fit a kernel distribution to the corresponding predictor.

This table describes the supported options.

ValueDescription
1-by-2 numeric row vectorThe density support applies to the specified bounds, for example [L,U], where L and U are the finite lower and upper bounds, respectively.
'positive'The density support applies to all positive real values.
'unbounded'The density support applies to all real values.

Kernel smoother window width, specified as a numeric matrix. Width is a K-by-P matrix, where K is the number of classes in the data, and P is the number of predictors (size(X,2)).

Width(k,j) is the kernel smoother window width for the kernel smoothing density of predictor j within class k. NaNs in column j indicate that fitcnb did not fit predictor j using a kernel density.

### Response Properties

Unique class names used in the training model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors.

ClassNames has the same data type as Y, and has K elements (or rows) for character arrays. (The software treats string arrays as cell arrays of character vectors.)

Data Types: categorical | char | string | logical | double | cell

Response variable name, specified as a character vector.

Data Types: char | string

Class labels used to train the naive Bayes classifier, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Each row of Y represents the observed classification of the corresponding row of X.

Y has the same data type as the data in Y used for training the model. (The software treats string arrays as cell arrays of character vectors.)

Data Types: single | double | logical | char | string | cell | categorical

### Training Properties

Parameter values used to train the ClassificationNaiveBayes model, specified as a structure array. ModelParameters contains parameter values such as the name-value pair argument values used to train the naive Bayes classifier.

Access the fields of ModelParameters by using dot notation. For example, access the kernel support using Mdl.ModelParameters.Support.

Number of training observations in the training data stored in X and Y, specified as a numeric scalar.

Prior probabilities, specified as a numeric vector. The order of the elements in Prior corresponds to the elements of Mdl.ClassNames.

fitcnb normalizes the prior probabilities you set using the 'Prior' name-value pair argument, so that sum(Prior) = 1.

The value of Prior does not affect the best-fitting model. Therefore, you can reset Prior after training Mdl using dot notation.

Example: Mdl.Prior = [0.2 0.8]

Data Types: double | single

Observation weights, specified as a vector of nonnegative values with the same number of rows as Y. Each entry in W specifies the relative importance of the corresponding observation in Y. fitcnb normalizes the value you set for the 'Weights' name-value pair argument, so that the weights within a particular class sum to the prior probability for that class.

### Classifier Properties

Misclassification cost, specified as a numeric square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i. The rows correspond to the true class and the columns correspond to the predicted class. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames.

The misclassification cost matrix must have zeros on the diagonal.

The value of Cost does not influence training. You can reset Cost after training Mdl using dot notation.

Example: Mdl.Cost = [0 0.5 ; 1 0]

Data Types: double | single

Cross-validation optimization of hyperparameters, specified as a BayesianOptimization object or a table of hyperparameters and associated values. This property is nonempty if the 'OptimizeHyperparameters' name-value pair argument is nonempty when you create the model. The value of HyperparameterOptimizationResults depends on the setting of the Optimizer field in the HyperparameterOptimizationOptions structure when you create the model.

Value of Optimizer FieldValue of HyperparameterOptimizationResults
'bayesopt' (default)Object of class BayesianOptimization
'gridsearch' or 'randomsearch'Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

Classification score transformation, specified as a character vector or function handle. This table summarizes the available character vectors.

ValueDescription
'doublelogit'1/(1 + e–2x)
'invlogit'log(x / (1 – x))
'ismax'Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
'logit'1/(1 + ex)
'none' or 'identity'x (no transformation)
'sign'–1 for x < 0
0 for x = 0
1 for x > 0
'symmetric'2x – 1
'symmetricismax'Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
'symmetriclogit'2/(1 + ex) – 1

For a MATLAB® function or a function you define, use its function handle for the score transformation. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Example: Mdl.ScoreTransform = 'logit'

Data Types: char | string | function handle

## Object Functions

 compact Reduce size of naive Bayes classifier crossval Cross-validate naive Bayes classifier edge Classification edge for naive Bayes classifier logp Log unconditional probability density for naive Bayes classifier loss Classification loss for naive Bayes classifier margin Classification margins for naive Bayes classifier partialDependence Compute partial dependence plotPartialDependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots predict Classify observations using naive Bayes classifier resubEdge Resubstitution classification edge for naive Bayes classifier resubLoss Resubstitution classification loss for naive Bayes classifier resubMargin Resubstitution classification margins for naive Bayes classifier resubPredict Classify observations using naive Bayes classifier

## Examples

collapse all

Create a naive Bayes classifier for Fisher's iris data set. Then, specify prior probabilities after training the classifier.

Load the fisheriris data set. Create X as a numeric matrix that contains four petal measurements for 150 irises. Create Y as a cell array of character vectors that contains the corresponding iris species.

X = meas;
Y = species;

Train a naive Bayes classifier using the predictors X and class labels Y. fitcnb assumes each predictor is independent and fits each predictor using a normal distribution by default.

Mdl = fitcnb(X,Y)
Mdl =
ClassificationNaiveBayes
ResponseName: 'Y'
CategoricalPredictors: []
ClassNames: {'setosa'  'versicolor'  'virginica'}
ScoreTransform: 'none'
NumObservations: 150
DistributionNames: {'normal'  'normal'  'normal'  'normal'}
DistributionParameters: {3x4 cell}

Properties, Methods

Mdl is a trained ClassificationNaiveBayes classifier. Some of the Mdl properties appear in the Command Window.

Display the properties of Mdl using dot notation. For example, display the class names and prior probabilities.

Mdl.ClassNames
ans = 3x1 cell
{'setosa'    }
{'versicolor'}
{'virginica' }

Mdl.Prior
ans = 1×3

0.3333    0.3333    0.3333

The order of the class prior probabilities in Mdl.Prior corresponds to the order of the classes in Mdl.ClassNames. By default, the prior probabilities are the respective relative frequencies of the classes in the data. Alternatively, you can set the prior probabilities when calling fitcnb by using the 'Prior' name-value pair argument.

Set the prior probabilities after training the classifier by using dot notation. For example, set the prior probabilities to 0.5, 0.2, and 0.3, respectively.

Mdl.Prior = [0.5 0.2 0.3];

You can now use this trained classifier to perform additional tasks. For example, you can label new measurements using predict or cross-validate the classifier using crossval.

Train and cross-validate a naive Bayes classifier. fitcnb implements 10-fold cross-validation by default. Then, estimate the cross-validated classification error.

Load the ionosphere data set. Remove the first two predictors for stability.

X = X(:,3:end);
rng('default')  % for reproducibility

Train and cross-validate a naive Bayes classifier using the predictors X and class labels Y. A recommended practice is to specify the class names. fitcnb assumes that each predictor is conditionally and normally distributed.

CVMdl = fitcnb(X,Y,'ClassNames',{'b','g'},'CrossVal','on')
CVMdl =
ClassificationPartitionedModel
CrossValidatedModel: 'NaiveBayes'
PredictorNames: {1x32 cell}
ResponseName: 'Y'
NumObservations: 351
KFold: 10
Partition: [1x1 cvpartition]
ClassNames: {'b'  'g'}
ScoreTransform: 'none'

Properties, Methods

CVMdl is a ClassificationPartitionedModel cross-validated, naive Bayes classifier. Alternatively, you can cross-validate a trained ClassificationNaiveBayes model by passing it to crossval.

Display the first training fold of CVMdl using dot notation.

CVMdl.Trained{1}
ans =
CompactClassificationNaiveBayes
ResponseName: 'Y'
CategoricalPredictors: []
ClassNames: {'b'  'g'}
ScoreTransform: 'none'
DistributionNames: {1x32 cell}
DistributionParameters: {2x32 cell}

Properties, Methods

Each fold is a CompactClassificationNaiveBayes model trained on 90% of the data.

Full and compact naive Bayes models are not used for predicting on new data. Instead, use them to estimate the generalization error by passing CVMdl to kfoldLoss.

genError = kfoldLoss(CVMdl)
genError = 0.1852

On average, the generalization error is approximately 19%.

You can specify a different conditional distribution for the predictors, or tune the conditional distribution parameters to reduce the generalization error.

expand all

expand all

## References

[1] Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer Series in Statistics. New York, NY: Springer, 2009. https://doi.org/10.1007/978-0-387-84858-7.

[2] Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval, NY: Cambridge University Press, 2008.