CompactClassificationEnsemble

Compact classification ensemble

Description

Compact version of a classification ensemble. The compact version does not include the data for training the classification ensemble. Therefore, you cannot perform some tasks with a compact classification ensemble, such as cross validation. Use a compact classification ensemble for making predictions (classifications) of new data.

Creation

Create a CompactClassificationEnsemble object from a full ClassificationEnsemble or ClassificationBaggedEnsemble model object by using compact.

Properties

expand all

`CategoricalPredictors` — Categorical predictor indices
Read-only: vector of positive integers | `[]`

This property is read-only.

Categorical predictor indices, returned as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

`ClassNames` — List of elements in `Y` with duplicates removed
Read-only: categorical array | cell array of character vectors | character array | logical vector | numeric vector

This property is read-only.

List of the elements in Y with duplicates removed, returned as a categorical array, cell array of character vectors, character array, logical vector, or numeric vector. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.)

Data Types: double | logical | char | cell | categorical

`CombineWeights` — Method used to combine weak learner weights
Read-only: `'WeightedAverage'` | `'WeightedSum'`

This property is read-only.

Method used to combine weak learner weights, returned as either 'WeightedAverage' or 'WeightedSum'.

Data Types: char

`Cost` — Misclassification costs
Read-only: square numeric matrix

This property is read-only.

Misclassification costs, returned as a square numeric matrix. Cost has K rows and columns, where K is the number of classes.

Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames.

Data Types: double

`ExpandedPredictorNames` — Expanded predictor names
Read-only: cell array of character vectors

This property is read-only.

Expanded predictor names, returned as a cell array of character vectors.

If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames.

Data Types: cell

`NumTrained` — Number of trained weak learners
Read-only: positive integer

This property is read-only.

Number of trained weak learners in the ensemble, returned as a positive integer.

Data Types: double

`PredictorNames` — Predictor names
Read-only: cell array of character vectors

This property is read-only.

Predictor names, specified as a cell array of character vectors. The order of the entries in PredictorNames is the same as in the training data.

Data Types: cell

`Prior` — Prior probabilities for each class
Read-only: numeric vector

This property is read-only.

Prior probabilities for each class, returned as a K-element numeric vector, where K is the number of unique classes in the response. The order of the elements of Prior corresponds to the order of the classes in ClassNames.

Data Types: double

`ResponseName` — Name of response variable
Read-only: character vector

This property is read-only.

Name of the response variable, returned as a character vector.

Data Types: char

`ScoreTransform` — Function for transforming scores
function handle | name of a built-in transformation function | `"none"`

Function for transforming scores, specified as a function handle or the name of a built-in transformation function. "none" means no transformation; equivalently, "none" means @(x)x. For a list of built-in transformation functions and the syntax of custom transformation functions, see ScoreTransform (for trees) or ScoreTransform (for ensembles).

Add or change a ScoreTransform function using dot notation:

Mdl.ScoreTransform = "function"
% or
Mdl.ScoreTransform = @function

Data Types: char | string | function_handle

`Trained` — Trained weak learners
Read-only: cell vector

This property is read-only.

Trained weak learners, returned as a cell vector. The entries of the cell vector contain the corresponding compact classification models.

Data Types: cell

`TrainedWeights` — Trained weak learner weights
Read-only: numeric vector

This property is read-only.

Trained weak learner weights, returned as a numeric vector. TrainedWeights has NumTrained elements, where NumTrained is the number of weak learners in the ensemble. The ensemble computes the predicted response by aggregating weighted predictions from its learners.

Data Types: double

`UsePredForLearner` — Indicator that learner `j` uses predictor `i`
logical matrix

Indicator that learner j uses predictor i, returned as a logical matrix of size P-by-NumTrained, where P is the number of predictors (columns) in the training data. UsePredForLearner(i,j) is true when learner j uses predictor i, and is false otherwise. For each learner, the predictors have the same order as the columns in the training data.

If the ensemble is not of type Subspace, all entries in UsePredForLearner are true.

Data Types: logical

Object Functions

`compareHoldout`	Compare accuracies of two classification models using new data
`edge`	Classification edge for classification ensemble model
`gather`	Gather properties of Statistics and Machine Learning Toolbox object from GPU
`lime`	Local interpretable model-agnostic explanations (LIME)
`loss`	Classification loss for classification ensemble model
`margin`	Classification margins for classification ensemble model
`partialDependence`	Compute partial dependence
`plotPartialDependence`	Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots
`predict`	Predict labels using classification ensemble model
`predictorImportance`	Estimates of predictor importance for classification ensemble of decision trees
`removeLearners`	Remove members of compact classification ensemble
`shapley`	Shapley values

Examples

collapse all

Reduce Size of Classification Ensemble

Open Live Script

Create a compact classification ensemble for efficiently making predictions on new data.

Load the ionosphere data set.

load ionosphere

Train a boosted ensemble of 100 classification trees using all measurements and the AdaBoostM1 method.

Mdl = fitcensemble(X,Y,Method="AdaBoostM1")

Mdl = 
  ClassificationEnsemble
             ResponseName: 'Y'
    CategoricalPredictors: []
               ClassNames: {'b'  'g'}
           ScoreTransform: 'none'
          NumObservations: 351
               NumTrained: 100
                   Method: 'AdaBoostM1'
             LearnerNames: {'Tree'}
     ReasonForTermination: 'Terminated normally after completing the requested number of training cycles.'
                  FitInfo: [100×1 double]
       FitInfoDescription: {2×1 cell}


  Properties, Methods

Mdl is a ClassificationEnsemble model object that contains the training data, among other things.

Create a compact version of Mdl.

CMdl = compact(Mdl)

CMdl = 
  CompactClassificationEnsemble
             ResponseName: 'Y'
    CategoricalPredictors: []
               ClassNames: {'b'  'g'}
           ScoreTransform: 'none'
               NumTrained: 100


  Properties, Methods

CMdl is a CompactClassificationEnsemble model object. CMdl is almost the same as Mdl. One exception is that CMdl does not store the training data.

Compare the amounts of space consumed by Mdl and CMdl.

mdlInfo = whos("Mdl");
cMdlInfo = whos("CMdl");
[mdlInfo.bytes cMdlInfo.bytes]

ans = 1×2

      847455      600166

Mdl consumes more space than CMdl.

CMdl.Trained stores the trained classification trees (CompactClassificationTree model objects) that compose Mdl.

Display a graph of the first tree in the compact ensemble.

view(CMdl.Trained{1},Mode="graph");

Figure Classification tree viewer contains an axes object and other objects of type uimenu, uicontrol. The axes object contains 36 objects of type line, text. One or more of the lines displays its values using only markers

By default, fitcensemble grows shallow trees for boosted ensembles of trees.

Predict the label of the mean of X using the compact ensemble.

predMeanX = predict(CMdl,mean(X))

predMeanX = 1×1 cell array
    {'g'}

Tips

For an ensemble of classification trees, the Trained property of ens stores an ens.NumTrained-by-1 cell vector of compact classification models. For a textual or graphical display of tree t in the cell vector, enter:

view(ens.Trained{t}.CompactRegressionLearner) for ensembles aggregated using LogitBoost or GentleBoost.
view(ens.Trained{t}) for all other aggregation methods.

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

The predict function supports code generation.
To integrate the prediction of an ensemble into Simulink^®, you can use the ClassificationEnsemble Predict block in the Statistics and Machine Learning Toolbox™ library or a MATLAB^® Function block with the predict function.
When you train an ensemble by using fitcensemble, the following restrictions apply.
- The value of the ScoreTransform name-value argument cannot be an anonymous function.
- Code generation limitations for the weak learners used in the ensemble also apply to the ensemble.
  - For decision tree weak learners, you cannot use surrogate splits; that is, the value of the Surrogate name-value argument must be 'off'.
  - For k-nearest neighbor weak learners, the value of the Distance name-value argument cannot be a custom distance function. The value of the DistanceWeight name-value argument can be a custom distance weight function, but it cannot be an anonymous function.
For fixed-point code generation, the following additional restrictions apply.
- When you train an ensemble by using fitcensemble, you must train an ensemble using tree learners, and the ScoreTransform value cannot be 'invlogit'.
- Categorical predictors (logical, categorical, char, string, or cell) are not supported. You cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model.
- Class labels with the categorical data type are not supported. Both the class label value in the training data (Tbl or Y) and the value of the ClassNames name-value argument cannot be an array with the categorical data type.

For more information, see Introduction to Code Generation for Statistics and Machine Learning Functions.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

The following object functions fully support GPU arrays:
The following object functions offer limited support for GPU arrays:
The object functions execute on a GPU if at least one of the following applies:
- The model was fitted with GPU arrays.
- The predictor data that you pass to the object function is a GPU array.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2011a

expand all

R2022a: `Cost` property stores the user-specified cost matrix

Starting in R2022a, the Cost property stores the user-specified cost matrix, so that you can compute the observed misclassification cost using the specified cost value. The software stores normalized prior probabilities (Prior) that do not reflect the penalties described in the cost matrix. To compute the observed misclassification cost, specify the LossFun name-value argument as "classifcost" when you call the loss function.

Note that model training has not changed and, therefore, the decision boundaries between classes have not changed.

For training, the fitting function updates the specified prior probabilities by incorporating the penalties described in the specified cost matrix, and then normalizes the prior probabilities and observation weights. This behavior has not changed. In previous releases, the software stored the default cost matrix in the Cost property and stored the prior probabilities used for training in the Prior property. Starting in R2022a, the software stores the user-specified cost matrix without modification, and stores normalized prior probabilities that do not reflect the cost penalties. For more details, see Misclassification Cost Matrix, Prior Probabilities, and Observation Weights.

Some object functions use the Cost and Prior properties:

The loss function uses the cost matrix stored in the Cost property if you specify the LossFun name-value argument as "classifcost" or "mincost".
The loss and edge functions use the prior probabilities stored in the Prior property to normalize the observation weights of the input data.

If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases.

If you want the software to handle the cost matrix, prior probabilities, and observation weights in the same way as in previous releases, adjust the prior probabilities and observation weights for the nondefault cost matrix, as described in Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix. Then, when you train a classification model, specify the adjusted prior probabilities and observation weights by using the Prior and Weights name-value arguments, respectively, and use the default cost matrix.

CompactClassificationEnsemble

Description

Creation

Properties

CategoricalPredictors — Categorical predictor indices Read-only: vector of positive integers | []

ClassNames — List of elements in Y with duplicates removed Read-only: categorical array | cell array of character vectors | character array | logical vector | numeric vector

CombineWeights — Method used to combine weak learner weights Read-only: 'WeightedAverage' | 'WeightedSum'

Cost — Misclassification costs Read-only: square numeric matrix

ExpandedPredictorNames — Expanded predictor names Read-only: cell array of character vectors

NumTrained — Number of trained weak learners Read-only: positive integer

PredictorNames — Predictor names Read-only: cell array of character vectors

Prior — Prior probabilities for each class Read-only: numeric vector

ResponseName — Name of response variable Read-only: character vector

ScoreTransform — Function for transforming scores function handle | name of a built-in transformation function | "none"

Trained — Trained weak learners Read-only: cell vector

TrainedWeights — Trained weak learner weights Read-only: numeric vector

UsePredForLearner — Indicator that learner j uses predictor i logical matrix

Object Functions

Examples

Reduce Size of Classification Ensemble

Tips

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2022a: Cost property stores the user-specified cost matrix

See Also

`CategoricalPredictors` — Categorical predictor indices
Read-only: vector of positive integers | `[]`

`ClassNames` — List of elements in `Y` with duplicates removed
Read-only: categorical array | cell array of character vectors | character array | logical vector | numeric vector

`CombineWeights` — Method used to combine weak learner weights
Read-only: `'WeightedAverage'` | `'WeightedSum'`

`Cost` — Misclassification costs
Read-only: square numeric matrix

`ExpandedPredictorNames` — Expanded predictor names
Read-only: cell array of character vectors

`NumTrained` — Number of trained weak learners
Read-only: positive integer

`PredictorNames` — Predictor names
Read-only: cell array of character vectors

`Prior` — Prior probabilities for each class
Read-only: numeric vector

`ResponseName` — Name of response variable
Read-only: character vector

`ScoreTransform` — Function for transforming scores
function handle | name of a built-in transformation function | `"none"`

`Trained` — Trained weak learners
Read-only: cell vector

`TrainedWeights` — Trained weak learner weights
Read-only: numeric vector

`UsePredForLearner` — Indicator that learner `j` uses predictor `i`
logical matrix

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

R2022a: `Cost` property stores the user-specified cost matrix