ClassificationKNN

k-nearest neighbor classification

Description

ClassificationKNN is a nearest neighbor classification model in which you can alter both the distance metric and the number of nearest neighbors. Because a ClassificationKNN classifier stores training data, you can use the model to compute resubstitution predictions. Alternatively, use the model to classify new observations using the predict object function.

Creation

Create a ClassificationKNN model using fitcknn.

Properties

expand all

KNN Properties

`BreakTies` — Tie-breaking algorithm
`"smallest"` | `"nearest"` | `"random"`

Tie-breaking algorithm used by predict when multiple classes have the same smallest cost, specified as one of the following:

"smallest" — Use the smallest index among tied groups.
"nearest" — Use the class with the nearest neighbor among tied groups.
"random" — Use a random tiebreaker among tied groups.

By default, ties occur when multiple classes have the same number of nearest points among the k nearest neighbors. BreakTies applies when IncludeTies is false.

Change BreakTies using dot notation: mdl.BreakTies = newBreakTies.

`CacheSize` — Size in megabytes of cache allocated for Gram matrix
`"maximal"` | positive scalar

Since R2025a

Size in megabytes of the cache allocated for the Gram matrix, specified as "maximal" or a positive scalar. The software can use CacheSize only when the Distance value begins with fast.

If the CacheSize value is "maximal", then during prediction, the software attempts to allocate enough memory for the Gram matrix, whose size is n-by-m, where n is the number of rows in the training predictor data X, and m is the number of rows in the test predictor data. The cache size does not have to be large enough for the Gram matrix, but must be at least large enough to hold an n-by-1 vector. Otherwise, the software uses the regular algorithm for computing the Euclidean distance.

If the Distance value begins with fast and CacheSize is too large or is "maximal", then the software might attempt to allocate a Gram matrix that exceeds the available memory. In this case, the software issues an error.

Change CacheSize using dot notation: mdl.CacheSize = 1e4.

Data Types: single | double | char | string

`Distance` — Distance metric
`"cityblock"` | `"chebychev"` | `"correlation"` | `"cosine"` | `"euclidean"` | `"hamming"` | function handle | ...

Distance metric, specified as a valid distance metric name or function handle. The allowable distance metric names depend on the neighbor-searcher method (see NSMethod).

`NSMethod` Value	Distance Metric Names
`"exhaustive"`	Any distance metric of `ExhaustiveSearcher`
`"kdtree"`	`"cityblock"`, `"chebychev"`, `"euclidean"`, or `"minkowski"`

This table includes valid distance metrics of ExhaustiveSearcher.

Distance Metric Names	Description
`"cityblock"`	City block distance.
`"chebychev"`	Chebychev distance (maximum coordinate difference).
`"correlation"`	One minus the sample linear correlation between observations (treated as sequences of values).
`"cosine"`	One minus the cosine of the included angle between observations (treated as vectors).
`"euclidean"`	Euclidean distance.
`"fasteuclidean"` (since R2025a)	Euclidean distance computed by using an alternative algorithm that saves time when the number of predictors is at least 10. In some cases, this faster algorithm can reduce accuracy. Algorithms starting with `fast` do not support sparse data. For details, see Fast Euclidean Distance Algorithm.
`"fastseuclidean"` (since R2025a)	Standardized Euclidean distance computed by using an alternative algorithm that saves time when the number of predictors is at least 10. In some cases, this faster algorithm can reduce accuracy. Algorithms starting with `fast` do not support sparse data. For details, see Fast Euclidean Distance Algorithm.
`"hamming"`	Hamming distance, percentage of coordinates that differ.
`"jaccard"`	One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ.
`"mahalanobis"`	Mahalanobis distance, computed using a positive definite covariance matrix `C` (see `DistParameter`).
`"minkowski"`	Minkowski distance, computed using a specified exponent (see `DistParameter`).
`"seuclidean"`	Standardized Euclidean distance. Each coordinate difference between `X` and a query point is scaled, meaning divided by a scale value `S` (see `DistParameter`).
`"spearman"`	One minus the sample Spearman's rank correlation between observations (treated as sequences of values).
`@distfun`	Distance function handle. `distfun` has the form function D2 = distfun(ZI,ZJ) % calculation of distance ... where `ZI` is a `1`-by-`N` vector containing one row of `X` or `Y`. `ZJ` is an `M2`-by-`N` matrix containing multiple rows of `X` or `Y`. `D2` is an `M2`-by-`1` vector of distances, and `D2(k)` is the distance between observations `ZI` and `ZJ(k,:)`.

Change Distance using dot notation: mdl.Distance = newDistance.

If NSMethod is "kdtree", you can use dot notation to change Distance only for the metrics "cityblock", "chebychev", "euclidean", and "minkowski".

For more information, see Distance Metrics.

Data Types: char | string | function_handle

`DistanceWeight` — Distance weighting function
`"equal"` | `"inverse"` | `"squaredinverse"` | function handle

Distance weighting function, specified as one of the values in this table.

Value	Description
`"equal"`	No weighting
`"inverse"`	Weight is 1/distance
`"squaredinverse"`	Weight is 1/distance²
`@fcn`	`fcn` is a function that accepts a matrix of nonnegative distances and returns a matrix of the same size containing nonnegative distance weights. For example, `"squaredinverse"` is equivalent to `@(d)d.^(–2)`.

Change DistanceWeight using dot notation: mdl.DistanceWeight = newDistanceWeight.

Data Types: char | function_handle

`DistParameter` — Parameter for distance metric
positive definite covariance matrix | positive scalar | vector of positive scale values

Parameter for the distance metric, specified as one of the values described in this table.

Distance Metric	Parameter
`"mahalanobis"`	Positive definite covariance matrix `C`
`"minkowski"`	Minkowski distance exponent, a positive scalar
`"seuclidean"`	Vector of positive scale values with length equal to the number of columns of `X`

For any other distance metric, the value of DistParameter must be [].

You can alter DistParameter using dot notation: mdl.DistParameter = newDistParameter. However, if Distance is "mahalanobis" or "seuclidean", then you cannot alter DistParameter.

Data Types: single | double

`IncludeTies` — Tie inclusion flag
`false` | `true`

Tie inclusion flag indicating whether predict includes all the neighbors whose distance values are equal to the kth smallest distance, specified as false or true. If IncludeTies is true, then the predict object function includes all of these neighbors. Otherwise, predict uses exactly k neighbors (see the BreakTies property).

Change IncludeTies using dot notation: mdl.IncludeTies = newIncludeTies.

Data Types: logical

`NSMethod` — Nearest neighbor search method
Read-only: `"kdtree"` | `"exhaustive"`

This property is read-only.

Nearest neighbor search method, returned as either "kdtree" or "exhaustive".

"kdtree" — Creates and uses a Kd-tree to find nearest neighbors.
"exhaustive" — Uses the exhaustive search algorithm. When predicting the class of a new point xnew, the software computes the distance values from all points in X to xnew to find nearest neighbors.

`NumNeighbors` — Number of nearest neighbors
positive integer value

Number of nearest neighbors in X used to classify each point during prediction, specified as a positive integer value.

Change NumNeighbors using dot notation: mdl.NumNeighbors = newNumNeighbors.

Data Types: single | double

Other Classification Properties

`CategoricalPredictors` — Categorical predictor indices
Read-only: `[]` | vector of positive integers

This property is read-only.

Categorical predictor indices, returned as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: double

`ClassNames` — Names of classes in training data `Y`
Read-only: categorical array | character array | logical vector | numeric vector | cell array of character vectors

This property is read-only.

Names of the classes in the training data Y with duplicates removed, returned as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as Y. (The software treats string arrays as cell arrays of character vectors.)

`Cost` — Cost of misclassification
square matrix

Cost of the misclassification of a point, specified as a square matrix. Cost(i,j) is the cost of classifying a point into class j if its true class is i (that is, the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns in Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response.

By default, Cost(i,j) = 1 if i ~= j, and Cost(i,j) = 0 if i = j. In other words, the cost is 0 for correct classification and 1 for incorrect classification.

Change a Cost matrix using dot notation: mdl.Cost = costMatrix.

Data Types: single | double

`ExpandedPredictorNames` — Expanded predictor names
Read-only: cell array of character vectors

This property is read-only.

Expanded predictor names, returned as a cell array of character vectors.

If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames.

Data Types: cell

`ModelParameters` — Parameters used in training `ClassificationKNN`
Read-only: object

This property is read-only.

Parameters used in training the ClassificationKNN model, returned as an object.

`Mu` — Predictor means
Read-only: numeric vector

This property is read-only.

Predictor means, returned as a numeric vector of length numel(PredictorNames).

If you do not standardize the predictor variables when training the model using fitcknn, then Mu is empty ([]).

Data Types: single | double

`NumObservations` — Number of observations
Read-only: positive integer scalar

This property is read-only.

Number of observations used in training the ClassificationKNN model, returned as a positive integer scalar. This number can be less than the number of rows in the training data because rows containing NaN values are not part of the fit.

Data Types: double

`PredictorNames` — Predictor variable names
Read-only: cell array of character vectors

This property is read-only.

Predictor variable names, returned as a cell array of character vectors. The variable names are in the same order in which they appear in the training data X.

Data Types: cell

`Prior` — Prior probabilities for each class
numeric vector

Prior probabilities for each class, specified as a numeric vector. The order of the elements in Prior corresponds to the order of the classes in ClassNames.

Add or change a Prior vector using dot notation: mdl.Prior = priorVector.

Data Types: single | double

`ResponseName` — Response variable name
Read-only: character vector

This property is read-only.

Response variable name, returned as a character vector.

Data Types: char

`RowsUsed` — Rows used in fitting
Read-only: `[]` | logical vector

This property is read-only.

Rows of the original training data used in fitting the ClassificationKNN model, returned as a logical vector. This property is empty if all rows are used.

Data Types: logical

`ScoreTransform` — Score transformation
`"none"` | `"doublelogit"` | `"invlogit"` | `"ismax"` | `"logit"` | function handle | ...

Score transformation, specified as a character vector, string scalar, or function handle.

This table summarizes the built-in score transformations.

Value	Description
`"doublelogit"`	1/(1 + e^–2x)
`"invlogit"`	log(x / (1 – x))
`"ismax"`	Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
`"logit"`	1/(1 + e^–x)
`"none"` or `"identity"`	x (no transformation)
`"sign"`	–1 for x < 0 0 for x = 0 1 for x > 0
`"symmetric"`	2x – 1
`"symmetricismax"`	Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
`"symmetriclogit"`	2/(1 + e^–x) – 1

For a MATLAB^® function or a function you define, use its function handle for score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Change ScoreTransform using dot notation: mdl.ScoreTransform = newScoreTransform.

Data Types: char | function_handle

`Sigma` — Predictor standard deviations
Read-only: numeric vector

This property is read-only.

Predictor standard deviations, returned as a numeric vector of length numel(PredictorNames).

If you do not standardize the predictor variables during training, then Sigma is empty ([]).

Data Types: single | double

`W` — Observation weights
Read-only: vector of nonnegative values

This property is read-only.

Observation weights, returned as a vector of nonnegative values with the same number of rows as Y. Each entry in W specifies the relative importance of the corresponding observation in Y.

Data Types: single | double

`X` — Unstandardized predictor data
Read-only: numeric matrix

This property is read-only.

Unstandardized predictor data, returned as a numeric matrix. Each column of X represents one predictor (variable), and each row represents one observation.

Data Types: single | double

`Y` — Class labels
Read-only: categorical array | character array | logical vector | numeric vector | cell array of character vectors

This property is read-only.

Class labels, returned as a categorical or character array, logical or numeric vector, or cell array of character vectors. Each value in Y is the observed class label for the corresponding row in X.

Y has the same data type as the data in Y used for training the model. (The software treats string arrays as cell arrays of character vectors.)

Hyperparameter Optimization Properties

`HyperparameterOptimizationResults` — Cross-validation optimization of hyperparameters
Read-only: `BayesianOptimization` object | table

This property is read-only.

Cross-validation optimization of hyperparameters, returned as a BayesianOptimization object or a table of hyperparameters and associated values. This property is nonempty when the OptimizeHyperparameters name-value argument is nonempty when you create the model using fitcknn. The value depends on the setting of the HyperparameterOptimizationOptions name-value argument when you create the model:

"bayesopt" (default) — Object of class BayesianOptimization
"gridsearch" or "randomsearch" — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

Object Functions

`compareHoldout`	Compare accuracies of two classification models using new data
`crossval`	Cross-validate machine learning model
`edge`	Edge of k-nearest neighbor classifier
`gather`	Gather properties of Statistics and Machine Learning Toolbox object from GPU
`lime`	Local interpretable model-agnostic explanations (LIME)
`loss`	Loss of k-nearest neighbor classifier
`margin`	Margin of k-nearest neighbor classifier
`partialDependence`	Compute partial dependence
`plotPartialDependence`	Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots
`predict`	Predict labels using k-nearest neighbor classification model
`resubEdge`	Resubstitution classification edge
`resubLoss`	Resubstitution classification loss
`resubMargin`	Resubstitution classification margin
`resubPredict`	Classify training data using trained classifier
`shapley`	Shapley values
`testckfold`	Compare accuracies of two classification models by repeated cross-validation

Examples

collapse all

Train k-Nearest Neighbor Classifier

Open Live Script

Train a k-nearest neighbor classifier using Fisher's iris data, where k, the number of nearest neighbors in the predictors, is 5.

Load Fisher's iris data.

load fisheriris
X = meas;
Y = species;

X is a numeric matrix that contains four measurements for 150 irises. Y is a cell array of character vectors that contains the corresponding iris species.

Train a 5-nearest neighbor classifier. Standardize the noncategorical predictor data.

Mdl = fitcknn(X,Y,NumNeighbors=5,Standardize=true)

Mdl = 
  ClassificationKNN
             ResponseName: 'Y'
    CategoricalPredictors: []
               ClassNames: {'setosa'  'versicolor'  'virginica'}
           ScoreTransform: 'none'
          NumObservations: 150
                 Distance: 'euclidean'
             NumNeighbors: 5


  Properties, Methods

Mdl is a trained ClassificationKNN classifier.

To access the properties of Mdl, use dot notation.

Mdl.ClassNames

ans = 3×1 cell
    {'setosa'    }
    {'versicolor'}
    {'virginica' }

Mdl.Prior

ans = 1×3

    0.3333    0.3333    0.3333

Mdl.Prior contains the class prior probabilities, which you can specify using the Prior name-value argument in fitcknn. The order of the class prior probabilities corresponds to the order of the classes in Mdl.ClassNames. By default, the prior probabilities are the respective relative frequencies of the classes in the data.

You can also reset the prior probabilities after training. For example, set the prior probabilities to 0.5, 0.2, and 0.3, respectively.

Mdl.Prior = [0.5 0.2 0.3];

You can pass Mdl to predict to label new measurements or crossval to cross-validate the classifier.

Tips

The compact function reduces the size of most classification models by removing the training data properties and any other properties that are not required to predict the labels of new observations. Because k-nearest neighbor classification models require all of the training data to predict labels, you cannot reduce the size of a ClassificationKNN model.

Alternative Functionality

knnsearch finds the k-nearest neighbors of points. rangesearch finds all the points within a fixed distance. You can use these functions for classification, as shown in Classify Query Data. If you want to perform classification, then using ClassificationKNN models can be more convenient because you can train a classifier in one step (using fitcknn) and classify in other steps (using predict). Alternatively, you can train a k-nearest neighbor classification model using one of the cross-validation options in the call to fitcknn. In this case, fitcknn returns a ClassificationPartitionedModel cross-validated model object.

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

The predict function supports code generation.
When you train a k-nearest neighbor classification model by using fitcknn, the following restrictions apply.
- The value of the Distance name-value argument cannot be "fasteuclidean", "fastseuclidean", or a custom distance function.
- The value of the DistanceWeight name-value argument can be a custom distance weight function, but it cannot be an anonymous function.
- The value of the ScoreTransform name-value argument cannot be an anonymous function.

For more information, see Introduction to Code Generation.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

The following object functions fully support GPU arrays:
The following object functions offer limited support for GPU arrays:
The object functions execute on a GPU if at least one of the following applies:
- The model was fitted with GPU arrays.
- The predictor data that you pass to the object function is a GPU array.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2012a

expand all

R2025a: Use fast Euclidean distances

The "fasteuclidean" and "fastseuclidean" distance metrics (Distance) accelerate the computation of Euclidean distances by using a cache and an alternative algorithm. You can set the size of the cache by using the CacheSize property.

ClassificationKNN

Description

Creation

Properties

KNN Properties

BreakTies — Tie-breaking algorithm "smallest" | "nearest" | "random"

CacheSize — Size in megabytes of cache allocated for Gram matrix "maximal" | positive scalar

Distance — Distance metric "cityblock" | "chebychev" | "correlation" | "cosine" | "euclidean" | "hamming" | function handle | ...

DistanceWeight — Distance weighting function "equal" | "inverse" | "squaredinverse" | function handle

DistParameter — Parameter for distance metric positive definite covariance matrix | positive scalar | vector of positive scale values

IncludeTies — Tie inclusion flag false | true

NSMethod — Nearest neighbor search method Read-only: "kdtree" | "exhaustive"

NumNeighbors — Number of nearest neighbors positive integer value

Other Classification Properties

CategoricalPredictors — Categorical predictor indices Read-only: [] | vector of positive integers

ClassNames — Names of classes in training data Y Read-only: categorical array | character array | logical vector | numeric vector | cell array of character vectors

Cost — Cost of misclassification square matrix

ExpandedPredictorNames — Expanded predictor names Read-only: cell array of character vectors

ModelParameters — Parameters used in training ClassificationKNN Read-only: object

Mu — Predictor means Read-only: numeric vector

NumObservations — Number of observations Read-only: positive integer scalar

PredictorNames — Predictor variable names Read-only: cell array of character vectors

Prior — Prior probabilities for each class numeric vector

ResponseName — Response variable name Read-only: character vector

RowsUsed — Rows used in fitting Read-only: [] | logical vector

ScoreTransform — Score transformation "none" | "doublelogit" | "invlogit" | "ismax" | "logit" | function handle | ...

Sigma — Predictor standard deviations Read-only: numeric vector

W — Observation weights Read-only: vector of nonnegative values

X — Unstandardized predictor data Read-only: numeric matrix

Y — Class labels Read-only: categorical array | character array | logical vector | numeric vector | cell array of character vectors

Hyperparameter Optimization Properties

HyperparameterOptimizationResults — Cross-validation optimization of hyperparameters Read-only: BayesianOptimization object | table

Object Functions

Examples

Train k-Nearest Neighbor Classifier

Tips

Alternative Functionality

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2025a: Use fast Euclidean distances

See Also

Topics

`BreakTies` — Tie-breaking algorithm
`"smallest"` | `"nearest"` | `"random"`

`CacheSize` — Size in megabytes of cache allocated for Gram matrix
`"maximal"` | positive scalar

`Distance` — Distance metric
`"cityblock"` | `"chebychev"` | `"correlation"` | `"cosine"` | `"euclidean"` | `"hamming"` | function handle | ...

`DistanceWeight` — Distance weighting function
`"equal"` | `"inverse"` | `"squaredinverse"` | function handle

`DistParameter` — Parameter for distance metric
positive definite covariance matrix | positive scalar | vector of positive scale values

`IncludeTies` — Tie inclusion flag
`false` | `true`

`NSMethod` — Nearest neighbor search method
Read-only: `"kdtree"` | `"exhaustive"`

`NumNeighbors` — Number of nearest neighbors
positive integer value

`CategoricalPredictors` — Categorical predictor indices
Read-only: `[]` | vector of positive integers

`ClassNames` — Names of classes in training data `Y`
Read-only: categorical array | character array | logical vector | numeric vector | cell array of character vectors

`Cost` — Cost of misclassification
square matrix

`ExpandedPredictorNames` — Expanded predictor names
Read-only: cell array of character vectors

`ModelParameters` — Parameters used in training `ClassificationKNN`
Read-only: object

`Mu` — Predictor means
Read-only: numeric vector

`NumObservations` — Number of observations
Read-only: positive integer scalar

`PredictorNames` — Predictor variable names
Read-only: cell array of character vectors

`Prior` — Prior probabilities for each class
numeric vector

`ResponseName` — Response variable name
Read-only: character vector

`RowsUsed` — Rows used in fitting
Read-only: `[]` | logical vector

`ScoreTransform` — Score transformation
`"none"` | `"doublelogit"` | `"invlogit"` | `"ismax"` | `"logit"` | function handle | ...

`Sigma` — Predictor standard deviations
Read-only: numeric vector

`W` — Observation weights
Read-only: vector of nonnegative values

`X` — Unstandardized predictor data
Read-only: numeric matrix

`Y` — Class labels
Read-only: categorical array | character array | logical vector | numeric vector | cell array of character vectors

`HyperparameterOptimizationResults` — Cross-validation optimization of hyperparameters
Read-only: `BayesianOptimization` object | table

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.