主要内容

oobError

Out-of-bag error

Syntax

err = oobError(B)
err = oobError(B,'param1',val1,'param2',val2,...)

Description

err = oobError(B) computes the misclassification probability (for classification trees) or mean squared error (for regression trees) for out-of-bag observations in the training data, using the trained bagger B. err is a vector of length NTrees, where NTrees is the number of trees in the ensemble.

err = oobError(B,'param1',val1,'param2',val2,...) specifies optional parameter name/value pairs:

'Mode'Character vector or string scalar indicating how oobError computes errors. If set to 'cumulative' (default), the method computes cumulative errors and err is a vector of length NTrees, where the first element gives error from trees(1), second element gives error from trees(1:2) etc., up to trees(1:NTrees). If set to 'individual', err is a vector of length NTrees, where each element is an error from each tree in the ensemble. If set to 'ensemble', err is a scalar showing the cumulative error for the entire ensemble.
'Trees'Vector of indices indicating what trees to include in this calculation. By default, this argument is set to 'all' and the method uses all trees. If 'Trees' is a numeric vector, the method returns a vector of length NTrees for 'cumulative' and 'individual' modes, where NTrees is the number of elements in the input vector, and a scalar for 'ensemble' mode. For example, in the 'cumulative' mode, the first element gives error from trees(1), the second element gives error from trees(1:2) etc.
'TreeWeights'Vector of tree weights. This vector must have the same length as the 'Trees' vector. oobError uses these weights to combine output from the specified trees by taking a weighted average instead of the simple nonweighted majority vote. You cannot use this argument in the 'individual' mode.

Algorithms

oobError estimates the weighted ensemble error for out-of-bag observations. That is, oobError applies error to the training data stored in the input TreeBagger model B, and selects the out-of-bag observations for each tree to compose the ensemble error.

  • B.X and B.Y are the training data predictors and responses, respectively.

  • B.OOBIndices specifies which observations are out-of-bag for each tree in the ensemble.

  • B.W specifies the observation weights.

  • Optionally:

    • Using the 'Mode' name-value pair argument, you can specify to return the individual, weighted ensemble error for each tree, or the entire, weighted ensemble error. By default, oobError returns the cumulative, weighted ensemble error.

    • Using the 'Trees' name-value pair argument, you can choose which trees to use in the ensemble error calculations.

    • Using the 'TreeWeights' name-value pair argument, you can attribute each tree with a weight.

oobError applies the algorithms described below. For more details, see error and predict.

For regression problems, oobError returns the weighted MSE.

  1. oobError predicts responses for all out-of-bag observations.

  2. The MSE estimate depends on the value of 'Mode'.

    • If you specify 'Mode','Individual', then oobError sets any in bag observations within a selected tree to the weighted sample average of the observed, training data responses. Then, oobError computes the weighted MSE for each selected tree.

    • If you specify 'Mode','Cumulative', then ooError returns a vector of cumulative, weighted MSEs, where MSEt is the cumulative, weighted MSE for selected tree t. To compute MSEt, for each observation that is out of bag for at least one tree through tree t, oobError computes the cumulative, weighted mean of the predicted responses through tree t. oobError sets observations that are in bag for all selected trees through tree t to the weighted sample average of the observed, training data responses. Then, oobError computes MSEt.

    • If you specify 'Mode','Ensemble', then, for each observation that is out of bag for at least one tree, oobError computes the weighted mean over all selected trees. oobError sets observations that are in bag for all selected trees to the weighted sample average of the observed, training data responses. Then, oobError computes the weighted MSE, which is the same as the final, cumulative, weighted MSE.

In classification problems, oobError returns the weighted misclassification rate.

  1. oobError predicts classes for all out-of-bag observations.

  2. The weighted misclassification rate estimate depends on the value of 'Mode'.

    • If you specify 'Mode','Individual', then oobError sets any in bag observations within a selected tree to the predicted, weighted, most popular class over all training responses. If there are multiple most popular classes, error considers the one listed first in the ClassNames property of the TreeBagger model the most popular. Then, oobError computes the weighted misclassification rate for each selected tree.

    • If you specify 'Mode','Cumulative', then ooError returns a vector of cumulative, weighted misclassification rates, where et* is the cumulative, weighted misclassification rate for selected tree t. To compute et*, for each observation that is out of bag for at least one tree through tree t, oobError finds the predicted, cumulative, weighted most popular class through tree t. oobError sets observations that are in bag for all selected trees through tree t to the weighted, most popular class over all training responses. If there are multiple most popular classes, error considers the one listed first in the ClassNames property of the TreeBagger model the most popular. Then, oobError computes et*.

    • If you specify 'Mode','Ensemble', then, for each observation that is out of bag for at least one tree, oobError computes the weighted, most popular class over all selected trees. oobError sets observations that are in bag for all selected trees through tree t to the predicted, weighted, most popular class over all training responses. If there are multiple most popular classes, error considers the one listed first in the ClassNames property of the TreeBagger model the most popular. Then, oobError computes the weighted misclassification rate , which is the same as the final, cumulative, weighted misclassification rate.

Version History

expand all