Main Content

templateEnsemble

Ensemble learning template

Description

t = templateEnsemble(Method,NLearn,Learners) returns an ensemble learning template that specifies to use the ensemble aggregation method Method, NLearn learning cycles, and weak learners Learners.

All other options of the template (t) specific to ensemble learning appear empty, but the software uses their corresponding default values during training.

example

t = templateEnsemble(Method,NLearn,Learners,Name,Value) returns an ensemble template with additional options specified by one or more name-value pair arguments.

For example, you can specify the number of predictors in each random subspace learner, learning rate for shrinkage, or the target classification error for RobustBoost.

If you display t in the Command Window, then all options appear empty ([]), except those options that you specify using name-value pair arguments. During training, the software uses default values for empty options.

example

Examples

collapse all

Use templateEnsemble to specify an ensemble learning template. You must specify the ensemble method, the number of learning cycles, and the type of weak learners. For this example, specify the AdaBoostM1 method, 100 learners, and classification tree weak learners.

t = templateEnsemble('AdaBoostM1',100,'tree')
t = 
Fit template for classification AdaBoostM1.

                Type: 'classification'
              Method: 'AdaBoostM1'
    LearnerTemplates: 'Tree'
              NLearn: 100
           LearnRate: []

All properties of the template object are empty except for Method, Type, LearnerTemplates, and NLearn. When trained on, the software fills in the empty properties with their respective default values. For example, the software fills the LearnRate property with 1.

t is a plan for an ensemble learner, and no computation takes place when you specify it. You can pass t to fitcecoc to specify ensemble binary learners for ECOC multiclass learning.

Create an ensemble template for use in fitcecoc.

Load the arrhythmia data set.

load arrhythmia
tabulate(categorical(Y));
  Value    Count   Percent
      1      245     54.20%
      2       44      9.73%
      3       15      3.32%
      4       15      3.32%
      5       13      2.88%
      6       25      5.53%
      7        3      0.66%
      8        2      0.44%
      9        9      1.99%
     10       50     11.06%
     14        4      0.88%
     15        5      1.11%
     16       22      4.87%
rng(1); % For reproducibility

Some classes have small relative frequencies in the data.

Create a template for a AdaBoostM1 ensemble of classification trees, and specify to use 100 learners and a shrinkage of 0.1. By default, boosting grows stumps (i.e., one node having a set of leaves). Since there are classes with small frequencies, the trees must be leafy enough to be sensitive to the minority classes. Specify the minimum number of leaf node observations to 3.

tTree = templateTree('MinLeafSize',20);
t = templateEnsemble('AdaBoostM1',100,tTree,'LearnRate',0.1);

All properties of the template objects are empty except for Method and Type, and the corresponding properties of the name-value pair argument values in the function calls. When you pass t to the training function, the software fills in the empty properties with their respective default values.

Specify t as a binary learner for an ECOC multiclass model. Train using the default one-versus-one coding design.

Mdl = fitcecoc(X,Y,'Learners',t);
  • Mdl is a ClassificationECOC multiclass model.

  • Mdl.BinaryLearners is a 78-by-1 cell array of CompactClassificationEnsemble models.

  • Mdl.BinaryLearners{j}.Trained is a 100-by-1 cell array of CompactClassificationTree models, for j = 1,...,78.

You can verify that one of the binary learners contains a weak learner that isn't a stump by using view.

view(Mdl.BinaryLearners{1}.Trained{1},'Mode','graph')

Figure Classification tree viewer contains an axes object and other objects of type uimenu, uicontrol. The axes object contains 21 objects of type line, text. One or more of the lines displays its values using only markers

Display the in-sample (resubstitution) misclassification error.

L = resubLoss(Mdl,'LossFun','classiferror')
L = 
0.0819

Train a one-versus-all ECOC classifier using a GentleBoost ensemble of decision trees with surrogate splits. To speed up training, bin numeric predictors and use parallel computing. Binning is valid only when fitcecoc uses a tree learner. After training, estimate the classification error using 10-fold cross-validation. Note that parallel computing requires Parallel Computing Toolbox™.

Load Sample Data

Load and inspect the arrhythmia data set.

load arrhythmia
[n,p] = size(X)
n = 452
p = 279
isLabels = unique(Y);
nLabels = numel(isLabels)
nLabels = 13
tabulate(categorical(Y))
  Value    Count   Percent
      1      245     54.20%
      2       44      9.73%
      3       15      3.32%
      4       15      3.32%
      5       13      2.88%
      6       25      5.53%
      7        3      0.66%
      8        2      0.44%
      9        9      1.99%
     10       50     11.06%
     14        4      0.88%
     15        5      1.11%
     16       22      4.87%

The data set contains 279 predictors, and the sample size of 452 is relatively small. Of the 16 distinct labels, only 13 are represented in the response (Y). Each label describes various degrees of arrhythmia, and 54.20% of the observations are in class 1.

Train One-Versus-All ECOC Classifier

Create an ensemble template. You must specify at least three arguments: a method, a number of learners, and the type of learner. For this example, specify 'GentleBoost' for the method, 100 for the number of learners, and a decision tree template that uses surrogate splits because there are missing observations.

tTree = templateTree('surrogate','on');
tEnsemble = templateEnsemble('GentleBoost',100,tTree);

tEnsemble is a template object. Most of its properties are empty, but the software fills them with their default values during training.

Train a one-versus-all ECOC classifier using the ensembles of decision trees as binary learners. To speed up training, use binning and parallel computing.

  • Binning ('NumBins',50) — When you have a large training data set, you can speed up training (a potential decrease in accuracy) by using the 'NumBins' name-value pair argument. This argument is valid only when fitcecoc uses a tree learner. If you specify the 'NumBins' value, then the software bins every numeric predictor into a specified number of equiprobable bins, and then grows trees on the bin indices instead of the original data. You can try 'NumBins',50 first, and then change the 'NumBins' value depending on the accuracy and training speed.

  • Parallel computing ('Options',statset('UseParallel',true)) — With a Parallel Computing Toolbox license, you can speed up the computation by using parallel computing, which sends each binary learner to a worker in the pool. The number of workers depends on your system configuration. When you use decision trees for binary learners, fitcecoc parallelizes training using Intel® Threading Building Blocks (TBB) for dual-core systems and above. Therefore, specifying the 'UseParallel' option is not helpful on a single computer. Use this option on a cluster.

Additionally, specify that the prior probabilities are 1/K, where K = 13 is the number of distinct classes.

options = statset('UseParallel',true);
Mdl = fitcecoc(X,Y,'Coding','onevsall','Learners',tEnsemble,...
                'Prior','uniform','NumBins',50,'Options',options);
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 6).

Mdl is a ClassificationECOC model.

Cross-Validation

Cross-validate the ECOC classifier using 10-fold cross-validation.

CVMdl = crossval(Mdl,'Options',options);
Warning: One or more folds do not contain points from all the groups.

CVMdl is a ClassificationPartitionedECOC model. The warning indicates that some classes are not represented while the software trains at least one fold. Therefore, those folds cannot predict labels for the missing classes. You can inspect the results of a fold using cell indexing and dot notation. For example, access the results of the first fold by entering CVMdl.Trained{1}.

Use the cross-validated ECOC classifier to predict validation-fold labels. You can compute the confusion matrix by using confusionchart. Move and resize the chart by changing the inner position property to ensure that the percentages appear in the row summary.

oofLabel = kfoldPredict(CVMdl,'Options',options);
ConfMat = confusionchart(Y,oofLabel,'RowSummary','total-normalized');
ConfMat.InnerPosition = [0.10 0.12 0.85 0.85];

Reproduce Binned Data

Reproduce binned predictor data by using the BinEdges property of the trained model and the discretize function.

X = Mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = Mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
    idxNumeric = idxNumeric';
end
for j = idxNumeric 
    x = X(:,j);
    % Convert x to array if x is a table.
    if istable(x)
        x = table2array(x);
    end
    % Group x into bins by using the discretize function.
    xbinned = discretize(x,[-inf; edges{j}; inf]);
    Xbinned(:,j) = xbinned;
end

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

Input Arguments

collapse all

Ensemble aggregation method, specified as one of the method names in this list.

  • For classification with two classes:

    • 'AdaBoostM1' — Adaptive boosting

    • 'LogitBoost' — Adaptive logistic regression

    • 'GentleBoost' — Gentle adaptive boosting

    • 'RobustBoost' — Robust boosting (requires Optimization Toolbox™)

    • 'LPBoost' — Linear programming boosting (requires Optimization Toolbox)

    • 'TotalBoost' — Totally corrective boosting (requires Optimization Toolbox)

    • 'RUSBoost' — Random undersampling boosting

    • 'Subspace' — Random subspace

    • 'Bag' — Bootstrap aggregation (bagging)

  • For classification with three or more classes:

    • 'AdaBoostM2' — Adaptive boosting

    • 'LPBoost' — Linear programming boosting (requires Optimization Toolbox)

    • 'TotalBoost' — Totally corrective boosting (requires Optimization Toolbox)

    • 'RUSBoost' — Random undersampling boosting

    • 'Subspace' — Random subspace

    • 'Bag' — Bootstrap aggregation (bagging)

  • For regression:

    • 'LSBoost' — Least-squares boosting

    • 'Bag' — Bootstrap aggregation (bagging)

If you specify 'Method','Bag', then specify the problem type using the Type name-value pair argument, because you can specify 'Bag' for classification and regression problems.

For details about ensemble aggregation algorithms and examples, see Ensemble Algorithms and Choose an Applicable Ensemble Aggregation Method.

Number of ensemble learning cycles, specified as a positive integer or 'AllPredictorCombinations'.

  • If you specify a positive integer, then, at every learning cycle, the software trains one weak learner for every template object in Learners. Consequently, the software trains NLearn*numel(Learners) learners.

  • If you specify 'AllPredictorCombinations', then set Method to 'Subspace' and specify one learner only in Learners. With these settings, the software trains learners for all possible combinations of predictors taken NPredToSample at a time. Consequently, the software trains nchoosek(size(X,2),NPredToSample) learners.

For more details, see Tips.

Data Types: single | double | char | string

Weak learners to use in the ensemble, specified as a weak-learner name, weak-learner template object, or cell array of weak-learner template objects.

Weak LearnerWeak-Learner NameTemplate Object Creation FunctionMethod Settings
Discriminant analysis'Discriminant'templateDiscriminantRecommended for 'Subspace'
k nearest neighbors'KNN'templateKNNFor 'Subspace' only
Decision tree'Tree'templateTreeAll methods except 'Subspace'

For more details, see NLearn and Tips.

Example: For an ensemble composed of two types of classification trees, supply {t1 t2}, where t1 and t2 are classification tree templates.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'LearningRate',0.05,'NPrint',5 specifies to use 0.05 as the learning rate and to display a message to the command line every time it trains 5 learners.

General Ensemble Options

collapse all

Printout frequency, specified as the comma-separated pair consisting of 'NPrint' and a positive integer or 'off'.

To track the number of weak learners or folds that the software trained so far, specify a positive integer. That is, if you specify the positive integer m and:

  • Do not specify any cross-validation option of the fitting function (for example, CrossVal), then the software displays a message to the command line every time it completes training m weak learners.

  • A cross-validation option, then the software displays a message to the command line every time it finishes training m folds.

If you specify 'off', then the software does not display a message when it completes training weak learners.

Tip

When training an ensemble of many weak learners on a large data set, specify a positive integer for NPrint.

Example: 'NPrint',5

Data Types: single | double | char | string

Supervised learning type, specified as the comma-separated pair consisting of 'Type' and 'classification' or 'regression'.

  • If Method is 'bag', then the supervised learning type is ambiguous. Therefore, specify Type when bagging.

  • Otherwise, the value of Method determines the supervised learning type.

Example: 'Type','classification'

Sampling Options for Boosting Methods and Bagging

collapse all

Fraction of the training set to resample for every weak learner, specified as a positive scalar in (0,1]. To use 'FResample', set Resample to 'on'.

Example: 'FResample',0.75

Data Types: single | double

Flag indicating sampling with replacement, specified as the comma-separated pair consisting of 'Replace' and 'off' or 'on'.

  • For 'on', the software samples the training observations with replacement.

  • For 'off', the software samples the training observations without replacement. If you set Resample to 'on', then the software samples training observations assuming uniform weights. If you also specify a boosting method, then the software boosts by reweighting observations.

Unless you set Method to 'bag' or set Resample to 'on', Replace has no effect.

Example: 'Replace','off'

Flag indicating to resample, specified as the comma-separated pair consisting of 'Resample' and 'off' or 'on'.

  • If Method is a boosting method, then:

    • 'Resample','on' specifies to sample training observations using updated weights as the multinomial sampling probabilities.

    • 'Resample','off'(default) specifies to reweight observations at every learning iteration.

  • If Method is 'bag', then 'Resample' must be 'on'. The software resamples a fraction of the training observations (see FResample) with or without replacement (see Replace).

If you specify to resample using Resample, then it is good practice to resample to entire data set. That is, use the default setting of 1 for FResample.

AdaBoostM1, AdaBoostM2, LogitBoost, GentleBoost, and LSBoost Method Options

collapse all

Learning rate for shrinkage, specified as the comma-separated pair consisting of 'LearnRate' and a numeric scalar in the interval (0,1].

To train an ensemble using shrinkage, set LearnRate to a value less than 1, for example, 0.1 is a popular choice. Training an ensemble using shrinkage requires more learning iterations, but often achieves better accuracy.

Example: 'LearnRate',0.1

Data Types: single | double

RUSBoost Method Options

collapse all

Learning rate for shrinkage, specified as the comma-separated pair consisting of 'LearnRate' and a numeric scalar in the interval (0,1].

To train an ensemble using shrinkage, set LearnRate to a value less than 1, for example, 0.1 is a popular choice. Training an ensemble using shrinkage requires more learning iterations, but often achieves better accuracy.

Example: 'LearnRate',0.1

Data Types: single | double

Sampling proportion with respect to the lowest-represented class, specified as the comma-separated pair consisting of 'RatioToSmallest' and a numeric scalar or numeric vector of positive values with length equal to the number of distinct classes in the training data.

Suppose that there are K classes in the training data and the lowest-represented class has m observations in the training data.

  • If you specify the positive numeric scalar s, then the software samples s*m observations from each class, that is, it uses the same sampling proportion for each class. For more details, see Algorithms.

  • If you specify the numeric vector [s1,s2,...,sK], then the software samples si*m observations from class i, i = 1,...,K. The elements of RatioToSmallest correspond to the order of the class names specified using the ClassNames name-value pair argument of the fitting function (see Tips).

The default value is ones(K,1), which specifies to sample m observations from each class.

Example: 'RatioToSmallest',[2,1]

Data Types: single | double

LPBoost and TotalBoost Method Options

collapse all

Margin precision to control convergence speed, specified as the comma-separated pair consisting of 'MarginPrecision' and a numeric scalar in the interval [0,1]. MarginPrecision affects the number of boosting iterations required for convergence.

Tip

To train an ensemble using many learners, specify a small value for MarginPrecision. For training using a few learners, specify a large value.

Example: 'MarginPrecision',0.5

Data Types: single | double

RobustBoost Method Options

collapse all

Target classification error, specified as the comma-separated pair consisting of 'RobustErrorGoal' and a nonnegative numeric scalar. The upper bound on possible values depends on the values of RobustMarginSigma and RobustMaxMargin. However, the upper bound cannot exceed 1.

Tip

For a particular training set, usually there is an optimal range for RobustErrorGoal. If you set it too low or too high, then the software can produce a model with poor classification accuracy. Try cross-validating to search for the appropriate value.

Example: 'RobustErrorGoal',0.05

Data Types: single | double

Classification margin distribution spread over the training data, specified as the comma-separated pair consisting of 'RobustMarginSigma' and a positive numeric scalar. Before specifying RobustMarginSigma, consult the literature on RobustBoost, for example, [3].

Example: 'RobustMarginSigma',0.5

Data Types: single | double

Maximal classification margin in the training data, specified as the comma-separated pair consisting of 'RobustMaxMargin' and a nonnegative numeric scalar. The software minimizes the number of observations in the training data having classification margins below RobustMaxMargin.

Example: 'RobustMaxMargin',1

Data Types: single | double

Random Subspace Method Options

collapse all

Number of predictors to sample for each random subspace learner, specified as the comma-separated pair consisting of 'NPredToSample' and a positive integer in the interval 1,...,p, where p is the number of predictor variables (size(X,2) or size(Tbl,2)).

Data Types: single | double

Output Arguments

collapse all

Classification template for ensemble learning, returned as a template object. You can pass t to, for example, fitcecoc to specify how to create the ensemble learning classifier for the ECOC model.

If you display t in the Command Window, then all, unspecified options appear empty ([]). However, the software replaces empty options with their corresponding default values during training.

Tips

  • NLearn can vary from a few dozen to a few thousand. Usually, an ensemble with good predictive power requires from a few hundred to a few thousand weak learners. However, you do not have to train an ensemble for that many cycles at once. You can start by growing a few dozen learners, inspect the ensemble performance and then, if necessary, train more weak learners using resume for classification problems, or resume for regression problems.

  • Ensemble performance depends on the ensemble setting and the setting of the weak learners. That is, if you specify weak learners with default parameters, then the ensemble can perform. Therefore, like ensemble settings, it is good practice to adjust the parameters of the weak learners using templates, and to choose values that minimize generalization error.

  • If you specify to resample using Resample, then it is good practice to resample to entire data set. That is, use the default setting of 1 for FResample.

  • In classification problems (that is, Type is 'classification'):

    • If the ensemble aggregation method (Method) is 'bag' and:

      • The misclassification cost is highly imbalanced, then, for in-bag samples, the software oversamples unique observations from the class that has a large penalty.

      • The class prior probabilities are highly skewed, the software oversamples unique observations from the class that has a large prior probability.

      For smaller sample sizes, these combinations can result in a very low relative frequency of out-of-bag observations from the class that has a large penalty or prior probability. Consequently, the estimated out-of-bag error is highly variable and it might be difficult to interpret. To avoid large estimated out-of-bag error variances, particularly for small sample sizes, set a more balanced misclassification cost matrix using the Cost name-value pair argument of the fitting function, or a less skewed prior probability vector using Prior name-value pair argument of the fitting function.

    • Because the order of some input and output arguments correspond to the distinct classes in the training data, it is good practice to specify the class order using the ClassNames name-value pair argument of the fitting function.

      • To quickly determine the class order, remove all observations from the training data that are unclassified (that is, have a missing label), obtain and display an array of all the distinct classes, and then specify the array for ClassNames. For example, suppose the response variable (Y) is a cell array of labels. This code specifies the class order in the variable classNames.

        Ycat = categorical(Y);
        classNames = categories(Ycat)
        categorical assigns <undefined> to unclassified observations and categories excludes <undefined> from its output. Therefore, if you use this code for cell arrays of labels or similar code for categorical arrays, then you do not have to remove observations with missing labels to obtain a list of the distinct classes.

      • To specify that order should be from lowest-represented label to most-represented, then quickly determine the class order (as in the previous bullet), but arrange the classes in the list by frequency before passing the list to ClassNames. Following from the previous example, this code specifies the class order from lowest- to most-represented in classNamesLH.

        Ycat = categorical(Y);
        classNames = categories(Ycat);
        freq = countcats(Ycat);
        [~,idx] = sort(freq);
        classNamesLH = classNames(idx);

Algorithms

  • For details of ensemble aggregation algorithms, see Ensemble Algorithms.

  • If you specify Method to be a boosting algorithm and Learners to be decision trees, then the software grows stumps by default. A decision stump is one root node connected to two terminal, leaf nodes. You can adjust tree depth by specifying the MaxNumSplits, MinLeafSize, and MinParentSize name-value pair arguments using templateTree.

  • The software generates in-bag samples by oversampling classes with large misclassification costs and undersampling classes with small misclassification costs. Consequently, out-of-bag samples have fewer observations from classes with large misclassification costs and more observations from classes with small misclassification costs. If you train a classification ensemble using a small data set and a highly skewed cost matrix, then the number of out-of-bag observations per class might be very low. Therefore, the estimated out-of-bag error can have a large variance and might be difficult to interpret. The same phenomenon can occur for classes with large prior probabilities.

  • For the RUSBoost ensemble aggregation method (Method), the name-value pair argument RatioToSmallest specifies the sampling proportion for each class with respect to the lowest-represented class. For example, suppose that there are 2 classes in the training data, A and B. A have 100 observations and B have 10 observations. Also, suppose that the lowest-represented class has m observations in the training data.

    • If you set 'RatioToSmallest',2, then s*m = 2*10 = 20. Consequently, the software trains every learner using 20 observations from class A and 20 observations from class B. If you set 'RatioToSmallest',[2 2], then you will obtain the same result.

    • If you set 'RatioToSmallest',[2,1], then s1*m = 2*10 = 20 and s2*m = 1*10 = 10. Consequently, the software trains every learner using 20 observations from class A and 10 observations from class B.

  • For ensembles of decision trees, and for dual-core systems and above, fitcensemble and fitrensemble parallelize training using Intel® Threading Building Blocks (TBB). For details on Intel TBB, see https://www.intel.com/content/www/us/en/developer/tools/oneapi/onetbb.html.

References

[1] Freund, Y. “A more robust boosting algorithm.” arXiv:0905.2138v1, 2009.

Version History

Introduced in R2014b