主要内容

classificationTreeComponent

Pipeline component for multiclass classification using binary decision trees

Since R2026a

    Description

    classificationTreeComponent is a pipeline component that performs multiclass classification using a binary decision tree. The pipeline component uses the functionality of the fitctree function during the learn phase to train the tree classification model. The component uses the functionality of the loss and predict functions during the run phase to perform classification.

    Creation

    Description

    component = classificationTreeComponent creates a pipeline component for multiclass classification using a binary decision tree.

    component = classificationTreeComponent(Name=Value) sets writable Properties using one or more name-value arguments. For example, you can specify the maximum number of decision splits, pruning criterion, and misclassification cost.

    example

    Properties

    expand all

    Structural Parameters

    The software sets structural parameters when you create the component. You cannot modify structural parameters after creating the component.

    This property is read-only after the component is created.

    Observation weights flag, specified as 0 (false) or 1 (true). If UseWeights is true, the component adds a third input "Weights" to the Inputs component property, and a third input tag 3 to the InputTags component property.

    Example: c = classificationTreeComponent(UseWeights=1)

    Data Types: logical

    Learn Parameters

    The software sets learn parameters when you create the component. You can modify learn parameters using dot notation any time before you use the learn object function. Any unset learn parameters use the corresponding default values.

    Algorithm for the best split on a categorical predictor with C categories for data and K ≥ 3 classes, specified as one of the following values.

    ValueDescription
    "Exact"Consider all 2C–1 – 1 combinations and choose the split that has the lowest impurity.
    "PullLeft"Start with all C categories on the right branch. Consider moving each category to the left branch to achieve the minimum impurity for the K classes among the remaining categories. From this sequence, choose the split that has the lowest impurity.
    "PCA"Compute a score for each category using the inner product between the first principal component of a weighted covariance matrix (of the centered class probability matrix) and the vector of class probabilities for that category. Sort the scores in ascending order, and consider all C – 1 splits. Choose the split that has the lowest impurity.
    "OVAbyClass"Start with all C categories on the right branch. For each class, order the categories based on their probability for that class. For the first class, consider moving each category to the left branch in order, recording the impurity criterion at each move. Repeat for the remaining classes. From this sequence, choose the split that has the lowest impurity.

    By default, the component chooses the optimal subset of algorithms for each split using the known number of classes and levels of a categorical predictor. For binary classification, the component uses "Exact".

    For more information, see Splitting Categorical Predictors in Classification Trees.

    Example: c = classificationTreeComponent(AlgorithmForCategorical="PCA")

    Example: c.AlgorithmForCategorical = "Exact"

    Data Types: char | string

    Misclassification cost, specified as a square matrix or a structure.

    • If Cost is a square matrix, Cost(i,j) is the cost of classifying a point into class j if its true class is i.

    • If Cost is a structure S, it has two fields: S.ClassificationCosts, which contains the cost matrix; and S.ClassNames, which contains the group names and defines the class order of the rows and columns of the cost matrix.

    The default is Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j.

    Example: c = classificationTreeComponent(Cost=[0 1; 2 0])

    Example: c.Cost = [0 1; 1 0]

    Data Types: single | double | struct

    Maximum number of category levels, specified as a nonnegative scalar. The component splits a categorical predictor using the exact search algorithm if the predictor has at most MaxNumCategories levels in the split node. Otherwise, the component finds the best categorical split using one of the inexact algorithms.

    Example: c = classificationTreeComponent(MaxNumCategories=8)

    Example: c.MaxNumCategories = 15

    Data Types: single | double

    Maximum number of decision splits (or branch nodes), specified as a nonnegative scalar. The component splits MaxNumSplits or fewer branch nodes.

    The default value is size(X – 1), where X is the number of observations in the first data argument of learn.

    Example: c = classificationTreeComponent(MaxNumSplits=5)

    Example: c.MaxNumSplits = 10

    Data Types: single | double

    Flag to merge leaves, specified as "on" or "off".

    When MergeLeaves is "on", the component:

    • Merges leaves originating from the same parent node if doing so yields a sum of risk values greater than or equal to the risk associated with the parent node

    • Estimates the optimal sequence of pruned subtrees, but does not prune the classification tree

    Example: c = classificationTreeComponent(MergeLeaves="off")

    Example: c.MergeLeaves = "on"

    Data Types: char | string

    Minimum number of leaf node observations, specified as a positive integer scalar. Each leaf has at least MinLeafSize observations per tree leaf. If you specify both MinParentSize and MinLeafSize, the component uses the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize).

    Example: c = classificationTreeComponent(MinLeafSize=3)

    Example: c.MinLeafSize = 1

    Data Types: single | double

    Minimum number of branch node observations, specified as a positive integer scalar. Each branch node has at least MinParentSize observations. If you supply both MinParentSize and MinLeafSize, the component uses the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize).

    Example: c = classificationTreeComponent(MinParentSize=8)

    Example: c.MinParentSize = 12

    Data Types: single | double

    Number of bins for numeric predictors, specified as [] (empty) or a positive integer scalar.

    • If NumBins is empty ([]), the component does not bin any predictors.

    • If NumBins is a positive integer scalar, the component bins every numeric predictor into at most NumBins equiprobable bins, and then grows trees on the bin indices instead of the original data.

    Example: c = classificationTreeComponent(NumBins=50)

    Example: c.NumBins = []

    Data Types: single | double

    Number of predictors to select at random for each split, specified as "all" or a positive integer scalar.

    Example: c = classificationTreeComponent(NumVariablesToSample=3)

    Example: c.NumVariablesToSample = "all"

    Data Types: single | double | char | string

    Algorithm used to select the best split predictor at each node, specified as one of the following values.

    ValueDescription
    "allsplits"

    Standard CART (Classification and Regression Tree) algorithm — Selects the split predictor that maximizes the split-criterion gain over all possible splits of all predictors [1]

    "curvature"Curvature test — Selects the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response [3]. The training speed is similar to the speed of standard CART.
    "interaction-curvature"Interaction test — Chooses the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response, and minimizes the p-value of a chi-square test of independence between each pair of predictors and the response [2]. The training speed can be slower than the speed of standard CART.

    For "curvature" and "interaction-curvature", if all tests yield p-values greater than 0.05, the component stops splitting nodes.

    Example: c = classificationTreeComponent(PredictorSelection="curvature")

    Example: c.PredictorSelection = "interaction-curvature"

    Data Types: char | string

    Prior probabilities for each class, specified as a value in this table.

    ValueDescription
    "empirical"The class prior probabilities are the class relative frequencies. The class relative frequencies are determined by the second data argument of learn.
    "uniform"All class prior probabilities are equal to 1/K, where K is the number of classes.
    numeric vectorA numeric vector with one value for each class. Each element is a class prior probability. The component normalizes the elements such that they sum to 1.
    structure

    A structure S with two fields:

    • S.ClassNames contains a list of the class names.

    • S.ClassProbs contains a vector of corresponding prior probabilities. The component normalizes the elements such that they sum to 1.

    If you set UseWeights to true, the component renormalizes the weights to add up to the value of the prior probability in the respective class.

    Example: c = classificationTreeComponent(Prior="uniform")

    Example: c.Prior = "empirical"

    Data Types: single | double | char | string | struct

    Flag to estimate the optimal sequence of pruned subtrees, specified as "on" or "off". If Prune is "on", the component estimates the optimal sequence of pruned subtrees, but grows the classification tree without pruning it. If Prune is "off" and MergeLeaves is also "off", the component grows the classification tree without estimating the optimal sequence of pruned subtrees.

    Example: c = classificationTreeComponent(Prune="off")

    Example: c.Prune = "on"

    Data Types: char | string

    Pruning criterion, specified as "error" or "impurity".

    IfPruneCriterion is "error", the component splits nodes of the decision tree based on node error, or the fraction of misclassified classes at a node. If PruneCriterion is "impurity", the component splits nodes of the decision tree based on the impurity measure specified by the SplitCriterion value.

    Example: c = classificationTreeComponent(PruneCriterion="impurity")

    Example: c.PruneCriterion = "error"

    Data Types: char | string

    Split criterion, specified as "gdi" for Gini's diversity index, "twoing" for the twoing rule, or "deviance" for maximum deviance reduction (also known as cross-entropy).

    Gini's diversity index and maximum deviance reduction are both measures of impurity. A value of 0 represents a pure node with just one class. Otherwise, these values are positive.

    The twoing rule is not a purity measure of a node, but is a measure for deciding how to split a node into two. If the expression is large, the split made each child node purer. If the expression is small, the split does not increase node purity.

    For more information, see Impurity and Node Error.

    Example: c = classificationTreeComponent(SplitCriterion="deviance")

    Example: c.SplitCriterion = "twoing"

    Data Types: char | string

    Surrogate decision splits, specified as "off", "on", "all", or a positive integer scalar.

    • If Surrogate is "off", the component does not use surrogate splits.

    • If Surrogate is "on", the component finds at most 10 surrogate splits at each branch node.

    • If Surrogate is "all", the component finds all surrogate splits at each branch node, a process that can use considerable time and memory.

    • If Surrogate is a positive integer scalar, the component finds at most the specified number of surrogate splits at each branch node.

    Example: c = classificationTreeComponent(Surrogate="on")

    Example: c.Surrogate = "all"

    Data Types: single | double | char | string

    Run Parameters

    The software sets run parameters when you create the component. You can modify the run parameters at any time. Any unset run parameters use the corresponding default values.

    Loss function, specified as a built-in loss function name or a function handle.

    ValueDescription
    "binodeviance"Binomial deviance
    "classifcost"Observed misclassification cost
    "classiferror"Misclassified rate in decimal
    "exponential"Exponential loss
    "hinge"Hinge loss
    "logit"Logistic loss
    "mincost"Minimal expected misclassification cost (for classification scores that are posterior probabilities)
    "quadratic"Quadratic loss

    To specify a custom loss function, use function handle notation. For more information on custom loss functions, see LossFun.

    Example: c = classificationTreeComponent(LossFun="classiferror")

    Example: c.LossFun = "binodeviance"

    Data Types: char | string | function_handle

    Tree size, specified as "se" or "min".

    • If TreeSize is "se", the component returns the best pruning level, which corresponds to the highest pruning level with the loss within one standard deviation of the minimum.

    • If TreeSize is "min", the component returns the best pruning level, which corresponds to the pruning level with the smallest loss.

    Example: c = classificationTreeComponent(TreeSize="min")

    Example: c.TreeSize = "se"

    Data Types: char | string

    Score transformation, specified as a built-in function name or a function handle.

    This table summarizes the available built-in score transform functions.

    ValueDescription
    "doublelogit"1/(1 + e–2x)
    "invlogit"log(x / (1 – x))
    "ismax"Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
    "logit"1/(1 + ex)
    "none" or "identity"x (no transformation)
    "sign"–1 for x < 0
    0 for x = 0
    1 for x > 0
    "symmetric"2x – 1
    "symmetricismax"Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
    "symmetriclogit"2/(1 + ex) – 1

    To specify a custom score transform function, use function handle notation. The function must accept a matrix containing the original scores and return a matrix of the same size containing the transformed scores.

    Example: c = classificationTreeComponent(ScoreTransform="logit")

    Example: c.ScoreTransform = "symmetric"

    Data Types: char | string | function_handle

    Component Properties

    The software sets component properties when you create the component. You can modify the component properties (excluding HasLearnables and HasLearned) at any time. You cannot modify the HasLearnables and HasLearned properties directly.

    Component identifier, specified as a character vector or string scalar.

    Example: c = classificationTreeComponent(Name="Tree")

    Example: c.Name="TreeClassifier"

    Data Types: char | string

    Names of the input ports, specified as a character vector, string array, or cell array of character vectors. If UseWeights is true, the component adds the input port "Weights" to Inputs.

    Example: c = classificationTreeComponent(Inputs=["X","Y"])

    Example: c.Inputs = ["X1","Y1"]

    Data Types: char | string | cell

    Names of the output ports, specified as a character vector, string array, or cell array of character vectors.

    Example: c = classificationTreeComponent(Outputs=["Class","ClassScore","LossVal"])

    Example: c.Outputs = ["X","Y","Z"]

    Data Types: char | string | cell

    Tags that enable the automatic connection of the component inputs with other components or pipelines, specified as a nonnegative integer vector. If you specify InputTags, the number of tags must match the number of inputs in Inputs. If UseWeights is true, the component adds a third input tag to InputTags.

    Example: c = classificationTreeComponent(InputTags=[0 1])

    Example: c.InputTags = [1 0]

    Data Types: single | double

    Tags that enable the automatic connection of the component outputs with other components or pipelines, specified as a nonnegative integer vector. If you specify OutputTags, the number of tags must match the number of outputs in Outputs.

    Example: c = classificationTreeComponent(OutputTags=[1 0 4])

    Example: c.OutputTags = [1 2 0]

    Data Types: single | double

    This property is read-only.

    Indicator for learnables, returned as 1 (true). A value of 1 indicates that the component contains Learnables.

    Data Types: logical

    This property is read-only.

    Indicator showing the learning status of the component, returned as 0 (false) or 1 (true). A value 1 indicates that the learn object function has been applied to the component and the Learnables are nonempty.

    Data Types: logical

    Learnables

    The software sets learnables when you use the learn object function. You cannot modify learnables directly.

    This property is read-only.

    Trained model, returned as a CompactClassificationTree model object.

    Object Functions

    learnInitialize and evaluate pipeline or component
    runExecute pipeline or component for inference after learning
    resetReset pipeline or component
    seriesConnect components in series to create pipeline
    parallelConnect components or pipelines in parallel to create pipeline
    viewView diagram of pipeline inputs, outputs, components, and connections

    Examples

    collapse all

    Create a classificationTreeComponent pipeline component.

    component = classificationTreeComponent
    component = 
      classificationTreeComponent with properties:
    
                Name: "ClassificationTree"
              Inputs: ["Predictors"    "Response"]
           InputTags: [1 2]
             Outputs: ["Predictions"    "Scores"    "Loss"]
          OutputTags: [1 0 0]
    
       
    Learnables (HasLearned = false)
        TrainedModel: []
    
       
    Structural Parameters (locked)
          UseWeights: 0
    
    
    Show all parameters
    

    component is a classificationTreeComponent object that contains one learnable, TrainedModel. This property remains empty until you pass data to the component during the learn phase.

    To limit the number of splits in the tree model, set the MaxNumSplits property of the component to 7.

    component.MaxNumSplits = 7;

    Load the ionosphere data set and save the data in two tables.

    load ionosphere
    X = array2table(X);
    Y = array2table(Y);

    Use the learn object function to train the classificationTreeComponent object using the entire data set.

    component = learn(component,X,Y)
    component = 
      classificationTreeComponent with properties:
    
                Name: "ClassificationTree"
              Inputs: ["Predictors"    "Response"]
           InputTags: [1 2]
             Outputs: ["Predictions"    "Scores"    "Loss"]
          OutputTags: [1 0 0]
    
       
    Learnables (HasLearned = true)
        TrainedModel: [1×1 classreg.learning.classif.CompactClassificationTree]
    
       
    Structural Parameters (locked)
          UseWeights: 0
    
       
    Learn Parameters (locked)
        MaxNumSplits: 7
    
    
    Show all parameters
    

    Note that the HasLearned property is set to true, which indicates that the software trained the classification tree model TrainedModel. You can use component to classify new data using the run object function.

    References

    [1] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984.

    [2] Loh, W. Y. “Regression Trees with Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, Vol. 12, 2002, pp. 361–386.

    [3] Loh, W. Y., and Y. S. Shih. “Split Selection Methods for Classification Trees.” Statistica Sinica, Vol. 7, 1997, pp. 815–840.

    Version History

    Introduced in R2026a