classificationTreeComponent

Pipeline component for multiclass classification using binary decision trees

Since R2026a

Description

classificationTreeComponent is a pipeline component that performs multiclass classification using a binary decision tree. The pipeline component uses the functionality of the fitctree function during the learn phase to train the tree classification model. The component uses the functionality of the loss and predict functions during the run phase to perform classification.

Creation

Syntax

component = classificationTreeComponent

component = classificationTreeComponent(Name=Value)

Description

component = classificationTreeComponent creates a pipeline component for multiclass classification using a binary decision tree.

component = classificationTreeComponent(Name=Value) sets writable Properties using one or more name-value arguments. For example, you can specify the maximum number of decision splits, pruning criterion, and misclassification cost.

example

Properties

expand all

Structural Parameters

The software sets structural parameters when you create the component. You cannot modify structural parameters after creating the component.

`UseWeights` — Observation weights flag
`false` or `0` (default) | `true` or `1`

This property is read-only after the component is created.

Observation weights flag, specified as 0 (false) or 1 (true). If UseWeights is true, the component adds a third input "Weights" to the Inputs component property, and a third input tag 3 to the InputTags component property.

Example: c = classificationTreeComponent(UseWeights=1)

Data Types: logical

Learn Parameters

The software sets learn parameters when you create the component. You can modify learn parameters using dot notation any time before you use the learn object function. Any unset learn parameters use the corresponding default values.

`AlgorithmForCategorical` — Algorithm for best split on categorical predictor
`"Exact"` | `"PullLeft"` | `"PCA"` | `"OVAbyClass"`

Algorithm for the best split on a categorical predictor with C categories for data and K ≥ 3 classes, specified as one of the following values.

Value	Description
`"Exact"`	Consider all 2^C–1 – 1 combinations and choose the split that has the lowest impurity.
`"PullLeft"`	Start with all C categories on the right branch. Consider moving each category to the left branch to achieve the minimum impurity for the K classes among the remaining categories. From this sequence, choose the split that has the lowest impurity.
`"PCA"`	Compute a score for each category using the inner product between the first principal component of a weighted covariance matrix (of the centered class probability matrix) and the vector of class probabilities for that category. Sort the scores in ascending order, and consider all C – 1 splits. Choose the split that has the lowest impurity.
`"OVAbyClass"`	Start with all C categories on the right branch. For each class, order the categories based on their probability for that class. For the first class, consider moving each category to the left branch in order, recording the impurity criterion at each move. Repeat for the remaining classes. From this sequence, choose the split that has the lowest impurity.

By default, the component chooses the optimal subset of algorithms for each split using the known number of classes and levels of a categorical predictor. For binary classification, the component uses "Exact".

For more information, see Splitting Categorical Predictors in Classification Trees.

Example: c = classificationTreeComponent(AlgorithmForCategorical="PCA")

Example: c.AlgorithmForCategorical = "Exact"

Data Types: char | string

`Cost` — Misclassification cost
square matrix | structure

Misclassification cost, specified as a square matrix or a structure.

If Cost is a square matrix, Cost(i,j) is the cost of classifying a point into class j if its true class is i.
If Cost is a structure S, it has two fields: S.ClassificationCosts, which contains the cost matrix; and S.ClassNames, which contains the group names and defines the class order of the rows and columns of the cost matrix.

The default is Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j.

Example: c = classificationTreeComponent(Cost=[0 1; 2 0])

Example: c.Cost = [0 1; 1 0]

Data Types: single | double | struct

`MaxNumCategories` — Maximum number of category levels
`10` (default) | nonnegative scalar

Maximum number of category levels, specified as a nonnegative scalar. The component splits a categorical predictor using the exact search algorithm if the predictor has at most MaxNumCategories levels in the split node. Otherwise, the component finds the best categorical split using one of the inexact algorithms.

Example: c = classificationTreeComponent(MaxNumCategories=8)

Example: c.MaxNumCategories = 15

Data Types: single | double

`MaxNumSplits` — Maximum number of decision splits
nonnegative scalar

Maximum number of decision splits (or branch nodes), specified as a nonnegative scalar. The component splits MaxNumSplits or fewer branch nodes.

The default value is size(X – 1), where X is the number of observations in the first data argument of learn.

Example: c = classificationTreeComponent(MaxNumSplits=5)

Example: c.MaxNumSplits = 10

Data Types: single | double

`MergeLeaves` — Flag to merge leaves
`"on"` (default) | `"off"`

Flag to merge leaves, specified as "on" or "off".

When MergeLeaves is "on", the component:

Merges leaves originating from the same parent node if doing so yields a sum of risk values greater than or equal to the risk associated with the parent node
Estimates the optimal sequence of pruned subtrees, but does not prune the classification tree

Example: c = classificationTreeComponent(MergeLeaves="off")

Example: c.MergeLeaves = "on"

Data Types: char | string

`MinLeafSize` — Minimum number of leaf node observations
`1` (default) | positive integer scalar

Minimum number of leaf node observations, specified as a positive integer scalar. Each leaf has at least MinLeafSize observations per tree leaf. If you specify both MinParentSize and MinLeafSize, the component uses the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize).

Example: c = classificationTreeComponent(MinLeafSize=3)

Example: c.MinLeafSize = 1

Data Types: single | double

`MinParentSize` — Minimum number of branch node observations
`10` (default) | positive integer scalar

Minimum number of branch node observations, specified as a positive integer scalar. Each branch node has at least MinParentSize observations. If you supply both MinParentSize and MinLeafSize, the component uses the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize).

Example: c = classificationTreeComponent(MinParentSize=8)

Example: c.MinParentSize = 12

Data Types: single | double

`NumBins` — Number of bins for numeric predictors
`[]`(empty) (default) | positive integer scalar

Number of bins for numeric predictors, specified as [] (empty) or a positive integer scalar.

If NumBins is empty ([]), the component does not bin any predictors.
If NumBins is a positive integer scalar, the component bins every numeric predictor into at most NumBins equiprobable bins, and then grows trees on the bin indices instead of the original data.

Example: c = classificationTreeComponent(NumBins=50)

Example: c.NumBins = []

Data Types: single | double

`NumVariablesToSample` — Number of predictors to select
`"all"` (default) | positive integer scalar

Number of predictors to select at random for each split, specified as "all" or a positive integer scalar.

Example: c = classificationTreeComponent(NumVariablesToSample=3)

Example: c.NumVariablesToSample = "all"

Data Types: single | double | char | string

`PredictorSelection` — Algorithm used to select best split predictor
`"allsplits"` (default) | `"curvature"` | `"interaction-curvature"`

Algorithm used to select the best split predictor at each node, specified as one of the following values.

Value	Description
`"allsplits"`	Standard CART (Classification and Regression Tree) algorithm — Selects the split predictor that maximizes the split-criterion gain over all possible splits of all predictors [1]
`"curvature"`	Curvature test — Selects the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response [3]. The training speed is similar to the speed of standard CART.
`"interaction-curvature"`	Interaction test — Chooses the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response, and minimizes the p-value of a chi-square test of independence between each pair of predictors and the response [2]. The training speed can be slower than the speed of standard CART.

For "curvature" and "interaction-curvature", if all tests yield p-values greater than 0.05, the component stops splitting nodes.

Example: c = classificationTreeComponent(PredictorSelection="curvature")

Example: c.PredictorSelection = "interaction-curvature"

Data Types: char | string

`Prior` — Prior probabilities
`"empirical"` (default) | `"uniform"` | numeric vector | structure

Prior probabilities for each class, specified as a value in this table.

Value	Description
`"empirical"`	The class prior probabilities are the class relative frequencies. The class relative frequencies are determined by the second data argument of `learn`.
`"uniform"`	All class prior probabilities are equal to 1/K, where K is the number of classes.
numeric vector	A numeric vector with one value for each class. Each element is a class prior probability. The component normalizes the elements such that they sum to `1`.
structure	A structure `S` with two fields: `S.ClassNames` contains a list of the class names. `S.ClassProbs` contains a vector of corresponding prior probabilities. The component normalizes the elements such that they sum to `1`.

If you set UseWeights to true, the component renormalizes the weights to add up to the value of the prior probability in the respective class.

Example: c = classificationTreeComponent(Prior="uniform")

Example: c.Prior = "empirical"

Data Types: single | double | char | string | struct

`Prune` — Flag to estimate optimal sequence of pruned subtrees
`"on"` (default) | `"off"`

Flag to estimate the optimal sequence of pruned subtrees, specified as "on" or "off". If Prune is "on", the component estimates the optimal sequence of pruned subtrees, but grows the classification tree without pruning it. If Prune is "off" and MergeLeaves is also "off", the component grows the classification tree without estimating the optimal sequence of pruned subtrees.

Example: c = classificationTreeComponent(Prune="off")

Example: c.Prune = "on"

Data Types: char | string

`PruneCriterion` — Pruning criterion
`"error"` (default) | `"impurity"`

Pruning criterion, specified as "error" or "impurity".

IfPruneCriterion is "error", the component splits nodes of the decision tree based on node error, or the fraction of misclassified classes at a node. If PruneCriterion is "impurity", the component splits nodes of the decision tree based on the impurity measure specified by the SplitCriterion value.

Example: c = classificationTreeComponent(PruneCriterion="impurity")

Example: c.PruneCriterion = "error"

Data Types: char | string

`SplitCriterion` — Split criterion
`"gdi"` (default) | `"twoing"` | `"deviance"`

Split criterion, specified as "gdi" for Gini's diversity index, "twoing" for the twoing rule, or "deviance" for maximum deviance reduction (also known as cross-entropy).

Gini's diversity index and maximum deviance reduction are both measures of impurity. A value of 0 represents a pure node with just one class. Otherwise, these values are positive.

The twoing rule is not a purity measure of a node, but is a measure for deciding how to split a node into two. If the expression is large, the split made each child node purer. If the expression is small, the split does not increase node purity.

For more information, see Impurity and Node Error.

Example: c = classificationTreeComponent(SplitCriterion="deviance")

Example: c.SplitCriterion = "twoing"

Data Types: char | string

`Surrogate` — Surrogate decision splits
`"off"` (default) | `"on"` | `"all"` | positive integer scalar

Surrogate decision splits, specified as "off", "on", "all", or a positive integer scalar.

If Surrogate is "off", the component does not use surrogate splits.
If Surrogate is "on", the component finds at most 10 surrogate splits at each branch node.
If Surrogate is "all", the component finds all surrogate splits at each branch node, a process that can use considerable time and memory.
If Surrogate is a positive integer scalar, the component finds at most the specified number of surrogate splits at each branch node.

Example: c = classificationTreeComponent(Surrogate="on")

Example: c.Surrogate = "all"

Data Types: single | double | char | string

Run Parameters

The software sets run parameters when you create the component. You can modify the run parameters at any time. Any unset run parameters use the corresponding default values.

`LossFun` — Loss function
`"mincost"` (default) | `"binodeviance"` | `"classifcost"` | `"classiferror"` | `"exponential"` | `"hinge"` | `"logit"` | `"quadratic"` | function handle

Loss function, specified as a built-in loss function name or a function handle.

Value	Description
`"binodeviance"`	Binomial deviance
`"classifcost"`	Observed misclassification cost
`"classiferror"`	Misclassified rate in decimal
`"exponential"`	Exponential loss
`"hinge"`	Hinge loss
`"logit"`	Logistic loss
`"mincost"`	Minimal expected misclassification cost (for classification scores that are posterior probabilities)
`"quadratic"`	Quadratic loss

To specify a custom loss function, use function handle notation. For more information on custom loss functions, see LossFun.

Example: c = classificationTreeComponent(LossFun="classiferror")

Example: c.LossFun = "binodeviance"

Data Types: char | string | function_handle

`TreeSize` — Tree size
`"se"` (default) | `"min"`

Tree size, specified as "se" or "min".

If TreeSize is "se", the component returns the best pruning level, which corresponds to the highest pruning level with the loss within one standard deviation of the minimum.
If TreeSize is "min", the component returns the best pruning level, which corresponds to the pruning level with the smallest loss.

Example: c = classificationTreeComponent(TreeSize="min")

Example: c.TreeSize = "se"

Data Types: char | string

`ScoreTransform` — Score transformation
`"none"` (default) | `"doublelogit"` | `"invlogit"` | `"ismax"` | `"logit"` | `"identity"` | `"sign"` | `"symmetric"` | `"symmetricismax"` | `"symmetriclogit"` | function handle

Score transformation, specified as a built-in function name or a function handle.

This table summarizes the available built-in score transform functions.

Value	Description
`"doublelogit"`	1/(1 + e^–2x)
`"invlogit"`	log(x / (1 – x))
`"ismax"`	Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
`"logit"`	1/(1 + e^–x)
`"none"` or `"identity"`	x (no transformation)
`"sign"`	–1 for x < 0 0 for x = 0 1 for x > 0
`"symmetric"`	2x – 1
`"symmetricismax"`	Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
`"symmetriclogit"`	2/(1 + e^–x) – 1

To specify a custom score transform function, use function handle notation. The function must accept a matrix containing the original scores and return a matrix of the same size containing the transformed scores.

Example: c = classificationTreeComponent(ScoreTransform="logit")

Example: c.ScoreTransform = "symmetric"

Data Types: char | string | function_handle

Component Properties

The software sets component properties when you create the component. You can modify the component properties (excluding HasLearnables and HasLearned) at any time. You cannot modify the HasLearnables and HasLearned properties directly.

`Name` — Component identifier
`"ClassificationTree"` (default) | character vector | string scalar

Component identifier, specified as a character vector or string scalar.

Example: c = classificationTreeComponent(Name="Tree")

Example: c.Name="TreeClassifier"

Data Types: char | string

`Inputs` — Names of input ports
`["Predictors","Response"]` (default) | character vector | string array | cell array of character vectors

Names of the input ports, specified as a character vector, string array, or cell array of character vectors. If UseWeights is true, the component adds the input port "Weights" to Inputs.

Example: c = classificationTreeComponent(Inputs=["X","Y"])

Example: c.Inputs = ["X1","Y1"]

Data Types: char | string | cell

`Outputs` — Names of output ports
`["Predictions","Scores","Loss"]` (default) | character vector | string array | cell array of character vectors

Names of the output ports, specified as a character vector, string array, or cell array of character vectors.

Example: c = classificationTreeComponent(Outputs=["Class","ClassScore","LossVal"])

Example: c.Outputs = ["X","Y","Z"]

Data Types: char | string | cell

`InputTags` — Tags that enable automatic connection of component inputs
`[1 2]` (default) | nonnegative integer vector

Tags that enable the automatic connection of the component inputs with other components or pipelines, specified as a nonnegative integer vector. If you specify InputTags, the number of tags must match the number of inputs in Inputs. If UseWeights is true, the component adds a third input tag to InputTags.

Example: c = classificationTreeComponent(InputTags=[0 1])

Example: c.InputTags = [1 0]

Data Types: single | double

`OutputTags` — Tags that enable automatic connection of component outputs
`[1 0 0]` (default) | nonnegative integer vector

Tags that enable the automatic connection of the component outputs with other components or pipelines, specified as a nonnegative integer vector. If you specify OutputTags, the number of tags must match the number of outputs in Outputs.

Example: c = classificationTreeComponent(OutputTags=[1 0 4])

Example: c.OutputTags = [1 2 0]

Data Types: single | double

`HasLearnables` — Indicator for learnables
Read-only: `1` (`true`) (default)

This property is read-only.

Indicator for learnables, returned as 1 (true). A value of 1 indicates that the component contains Learnables.

Data Types: logical

`HasLearned` — Indicator showing learning status of component
Read-only: `0` (`false`) (default) | `1` (`true`)

This property is read-only.

Indicator showing the learning status of the component, returned as 0 (false) or 1 (true). A value 1 indicates that the learn object function has been applied to the component and the Learnables are nonempty.

Data Types: logical

Learnables

The software sets learnables when you use the learn object function. You cannot modify learnables directly.

`TrainedModel` — Trained model
Read-only: `CompactClassificationTree` model object

This property is read-only.

Trained model, returned as a CompactClassificationTree model object.

Object Functions

`learn`	Initialize and evaluate pipeline or component
`run`	Execute pipeline or component for inference after learning
`reset`	Reset pipeline or component
`series`	Connect components in series to create pipeline
`parallel`	Connect components or pipelines in parallel to create pipeline
`view`	View diagram of pipeline inputs, outputs, components, and connections

Examples

collapse all

Create and Train Pipeline Component for Tree Classification

Create a classificationTreeComponent pipeline component.

component = classificationTreeComponent

component = 
  classificationTreeComponent with properties:

            Name: "ClassificationTree"
          Inputs: ["Predictors"    "Response"]
       InputTags: [1 2]
         Outputs: ["Predictions"    "Scores"    "Loss"]
      OutputTags: [1 0 0]

   
Learnables (HasLearned = false)
    TrainedModel: []

   
Structural Parameters (locked)
      UseWeights: 0


Show all parameters

component is a classificationTreeComponent object that contains one learnable, TrainedModel. This property remains empty until you pass data to the component during the learn phase.

To limit the number of splits in the tree model, set the MaxNumSplits property of the component to 7.

component.MaxNumSplits = 7;

Load the ionosphere data set and save the data in two tables.

load ionosphere
X = array2table(X);
Y = array2table(Y);

Use the learn object function to train the classificationTreeComponent object using the entire data set.

component = learn(component,X,Y)

component = 
  classificationTreeComponent with properties:

            Name: "ClassificationTree"
          Inputs: ["Predictors"    "Response"]
       InputTags: [1 2]
         Outputs: ["Predictions"    "Scores"    "Loss"]
      OutputTags: [1 0 0]

   
Learnables (HasLearned = true)
    TrainedModel: [1×1 classreg.learning.classif.CompactClassificationTree]

   
Structural Parameters (locked)
      UseWeights: 0

   
Learn Parameters (locked)
    MaxNumSplits: 7


Show all parameters

Note that the HasLearned property is set to true, which indicates that the software trained the classification tree model TrainedModel. You can use component to classify new data using the run object function.

References

[1] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984.

[2] Loh, W. Y. “Regression Trees with Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, Vol. 12, 2002, pp. 361–386.

[3] Loh, W. Y., and Y. S. Shih. “Split Selection Methods for Classification Trees.” Statistica Sinica, Vol. 7, 1997, pp. 815–840.

Version History

Introduced in R2026a

classificationTreeComponent

Description

Creation

Syntax

Description

Properties

Structural Parameters

UseWeights — Observation weights flag false or 0 (default) | true or 1

Learn Parameters

AlgorithmForCategorical — Algorithm for best split on categorical predictor "Exact" | "PullLeft" | "PCA" | "OVAbyClass"

Cost — Misclassification cost square matrix | structure

MaxNumCategories — Maximum number of category levels 10 (default) | nonnegative scalar

MaxNumSplits — Maximum number of decision splits nonnegative scalar

MergeLeaves — Flag to merge leaves "on" (default) | "off"

MinLeafSize — Minimum number of leaf node observations 1 (default) | positive integer scalar

MinParentSize — Minimum number of branch node observations 10 (default) | positive integer scalar

NumBins — Number of bins for numeric predictors [](empty) (default) | positive integer scalar

NumVariablesToSample — Number of predictors to select "all" (default) | positive integer scalar

PredictorSelection — Algorithm used to select best split predictor "allsplits" (default) | "curvature" | "interaction-curvature"

Prior — Prior probabilities "empirical" (default) | "uniform" | numeric vector | structure

Prune — Flag to estimate optimal sequence of pruned subtrees "on" (default) | "off"

PruneCriterion — Pruning criterion "error" (default) | "impurity"

SplitCriterion — Split criterion "gdi" (default) | "twoing" | "deviance"

Surrogate — Surrogate decision splits "off" (default) | "on" | "all" | positive integer scalar

Run Parameters

LossFun — Loss function "mincost" (default) | "binodeviance" | "classifcost" | "classiferror" | "exponential" | "hinge" | "logit" | "quadratic" | function handle

TreeSize — Tree size "se" (default) | "min"

ScoreTransform — Score transformation "none" (default) | "doublelogit" | "invlogit" | "ismax" | "logit" | "identity" | "sign" | "symmetric" | "symmetricismax" | "symmetriclogit" | function handle

Component Properties

Name — Component identifier "ClassificationTree" (default) | character vector | string scalar

Inputs — Names of input ports ["Predictors","Response"] (default) | character vector | string array | cell array of character vectors

Outputs — Names of output ports ["Predictions","Scores","Loss"] (default) | character vector | string array | cell array of character vectors

InputTags — Tags that enable automatic connection of component inputs [1 2] (default) | nonnegative integer vector

OutputTags — Tags that enable automatic connection of component outputs [1 0 0] (default) | nonnegative integer vector

HasLearnables — Indicator for learnables Read-only: 1 (true) (default)

HasLearned — Indicator showing learning status of component Read-only: 0 (false) (default) | 1 (true)

Learnables

TrainedModel — Trained model Read-only: CompactClassificationTree model object

Object Functions

Examples

Create and Train Pipeline Component for Tree Classification

References

Version History

See Also

`UseWeights` — Observation weights flag
`false` or `0` (default) | `true` or `1`

`AlgorithmForCategorical` — Algorithm for best split on categorical predictor
`"Exact"` | `"PullLeft"` | `"PCA"` | `"OVAbyClass"`

`Cost` — Misclassification cost
square matrix | structure

`MaxNumCategories` — Maximum number of category levels
`10` (default) | nonnegative scalar

`MaxNumSplits` — Maximum number of decision splits
nonnegative scalar

`MergeLeaves` — Flag to merge leaves
`"on"` (default) | `"off"`

`MinLeafSize` — Minimum number of leaf node observations
`1` (default) | positive integer scalar

`MinParentSize` — Minimum number of branch node observations
`10` (default) | positive integer scalar

`NumBins` — Number of bins for numeric predictors
`[]`(empty) (default) | positive integer scalar

`NumVariablesToSample` — Number of predictors to select
`"all"` (default) | positive integer scalar

`PredictorSelection` — Algorithm used to select best split predictor
`"allsplits"` (default) | `"curvature"` | `"interaction-curvature"`

`Prior` — Prior probabilities
`"empirical"` (default) | `"uniform"` | numeric vector | structure

`Prune` — Flag to estimate optimal sequence of pruned subtrees
`"on"` (default) | `"off"`

`PruneCriterion` — Pruning criterion
`"error"` (default) | `"impurity"`

`SplitCriterion` — Split criterion
`"gdi"` (default) | `"twoing"` | `"deviance"`

`Surrogate` — Surrogate decision splits
`"off"` (default) | `"on"` | `"all"` | positive integer scalar

`LossFun` — Loss function
`"mincost"` (default) | `"binodeviance"` | `"classifcost"` | `"classiferror"` | `"exponential"` | `"hinge"` | `"logit"` | `"quadratic"` | function handle

`TreeSize` — Tree size
`"se"` (default) | `"min"`

`ScoreTransform` — Score transformation
`"none"` (default) | `"doublelogit"` | `"invlogit"` | `"ismax"` | `"logit"` | `"identity"` | `"sign"` | `"symmetric"` | `"symmetricismax"` | `"symmetriclogit"` | function handle

`Name` — Component identifier
`"ClassificationTree"` (default) | character vector | string scalar

`Inputs` — Names of input ports
`["Predictors","Response"]` (default) | character vector | string array | cell array of character vectors

`Outputs` — Names of output ports
`["Predictions","Scores","Loss"]` (default) | character vector | string array | cell array of character vectors

`InputTags` — Tags that enable automatic connection of component inputs
`[1 2]` (default) | nonnegative integer vector

`OutputTags` — Tags that enable automatic connection of component outputs
`[1 0 0]` (default) | nonnegative integer vector

`HasLearnables` — Indicator for learnables
Read-only: `1` (`true`) (default)

`HasLearned` — Indicator showing learning status of component
Read-only: `0` (`false`) (default) | `1` (`true`)

`TrainedModel` — Trained model
Read-only: `CompactClassificationTree` model object