RegressionTree
Regression tree
Description
A decision tree with binary splits for regression. An object of class RegressionTree
can predict responses for new data with the
predict
method. The object contains the
data used for training, so can compute resubstitution predictions using resubPredict
.
Creation
Create a RegressionTree
object by using fitrtree
.
Properties
BinEdges
— Bin edges for numeric predictors
cell array of p numeric vectors
This property is read-only.
Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.
The software bins numeric predictors only if you specify the 'NumBins'
name-value argument as a positive integer scalar when training a model with tree learners.
The BinEdges
property is empty if the 'NumBins'
value is empty (default).
You can reproduce the binned predictor data Xbinned
by using the
BinEdges
property of the trained model
mdl
.
X = mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
idxNumeric = idxNumeric';
end
for j = idxNumeric
x = X(:,j);
% Convert x to array if x is a table.
if istable(x)
x = table2array(x);
end
% Group x into bins by using the discretize
function.
xbinned = discretize(x,[-inf; edges{j}; inf]);
Xbinned(:,j) = xbinned;
end
Xbinned
contains the bin indices, ranging from 1 to the number of bins, for numeric predictors.
Xbinned
values are 0 for categorical predictors. If
X
contains NaN
s, then the corresponding
Xbinned
values are NaN
s.
CategoricalPredictors
— Indices of categorical predictors
vector of positive integers | []
This property is read-only.
Categorical predictor
indices, specified as a vector of positive integers. CategoricalPredictors
contains index values indicating that the corresponding predictors are categorical. The index
values are between 1 and p
, where p
is the number of
predictors used to train the model. If none of the predictors are categorical, then this
property is empty ([]
).
Data Types: single
| double
CategoricalSplit
— Categorical splits
n
-by-2 cell array
This property is read-only.
Categorical splits, returned as an n
-by-2 cell array, where
n
is the number of categorical splits in
tree
. Each row in CategoricalSplit
gives
left and right values for a categorical split. For each branch node with categorical
split j
based on a categorical predictor variable
z
, the left child is chosen if z
is in
CategoricalSplit(j,1)
and the right child is chosen if
z
is in CategoricalSplit(j,2)
. The splits are
in the same order as nodes of the tree. Nodes for these splits can be found by running
cuttype
and selecting 'categorical'
cuts from
top to bottom.
Data Types: cell
Children
— Numbers of the child nodes for each node
n
-by-2 array
This property is read-only.
Numbers of the child nodes for each node in the tree, returned as an
n
-by-2 array, where n
is the number of nodes.
Leaf nodes have child node 0
.
Data Types: double
CutCategories
— Categories used at branches
n
-by-2 cell array
This property is read-only.
Categories used at branches in tree
, returned as an
n
-by-2 cell array, where n
is the number of
nodes. For each branch node i
based on a categorical predictor
variable X
, the left child is chosen if X
is among
the categories listed in CutCategories{i,1}
, and the right child is
chosen if X
is among those listed in
CutCategories{i,2}
. Both columns of
CutCategories
are empty for branch nodes based on continuous
predictors and for leaf nodes.
CutPoint
contains the cut points for
'continuous'
cuts, and CutCategories
contains
the set of categories.
Data Types: cell
CutPoint
— Values used as cut points
n
-element vector
This property is read-only.
Values used as cut points in tree
, returned as an
n
-element vector, where n
is the number of
nodes. For each branch node i
based on a continuous predictor
variable X
, the left child is chosen if
X<CutPoint(i)
and the right child is chosen if
X>=CutPoint(i)
. CutPoint
is
NaN
for branch nodes based on categorical predictors and for leaf
nodes.
CutPoint
contains the cut points for
'continuous'
cuts, and CutCategories
contains
the set of categories.
Data Types: double
CutPredictor
— Names of the variables used for branching in each node
cell array
This property is read-only.
Names of the variables used for branching in each node in tree
, returned as an n
-element cell array, where n
is the number of nodes. These variables are sometimes known as cut variables. For leaf nodes, CutPredictor
contains an empty character vector.
CutPoint
contains the cut points for 'continuous'
cuts, and CutCategories
contains the set of categories.
Data Types: cell
CutPredictorIndex
— Indices of variables used for branching in each node
n
-element array
This property is read-only.
Indices of variables used for branching in each node in tree
,
returned as an n
-element array, where n
is the
number of nodes. For more information, see CutPredictor
.
Data Types: double
CutType
— Type of cut at each node
n
-element cell array
This property is read-only.
Type of cut at each node in tree
, returned as an
n
-element cell array, where n
is the number of
nodes. For each node i
, CutType{i}
is:
'continuous'
— If the cut is defined in the formX < v
for a variableX
and cut pointv
.'categorical'
— If the cut is defined by whether a variableX
takes a value in a set of categories.''
— Ifi
is a leaf node.
CutPoint
contains the cut points for
'continuous'
cuts, and CutCategories
contains
the set of categories.
Data Types: cell
ExpandedPredictorNames
— Expanded predictor names
cell array of character vectors
This property is read-only.
Expanded predictor names, returned as a cell array of character vectors.
If the model uses encoding for categorical variables, then
ExpandedPredictorNames
includes the names that describe the
expanded variables. Otherwise, ExpandedPredictorNames
is the same as
PredictorNames
.
Data Types: cell
HyperparameterOptimizationResults
— Description of cross-validation optimization of hyperparameters
BayesianOptimization
object | table of hyperparameters and associated values
This property is read-only.
Description of the cross-validation optimization of hyperparameters, returned as a
BayesianOptimization
object or a table of
hyperparameters and associated values. Nonempty when the
OptimizeHyperparameters
name-value pair is nonempty at creation.
Value depends on the setting of the HyperparameterOptimizationOptions
name-value pair at creation:
'bayesopt'
(default) — Object of classBayesianOptimization
'gridsearch'
or'randomsearch'
— Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)
IsBranchNode
— Indicator of branch nodes
logical vector
This property is read-only.
Indicator of branch nodes, returned as an n
-element logical vector that is true
for each branch node and false
for each leaf node of tree
.
Data Types: logical
ModelParameters
— Parameters used in training tree
TreeParams
object
This property is read-only.
Parameters used in training tree
, returned as a
TreeParams
object. To display all parameter values,
enter tree.ModelParameters
. To access a particular
parameter, use dot notation.
NodeError
— Mean squared error for each node
n
-element vector
This property is read-only.
Mean squared error for each node in tree
, returned as an n
-element vector, where n
is the number of nodes in the tree.
Data Types: double
NodeMean
— Mean observation values for each node
n
-element vector
This property is read-only.
Mean observation values for each node in tree
, returned as an
n
-element vector, where n
is the number of
nodes in the tree. Every element in NodeMean
is the average of the
true Y
values over all observations in the node.
Data Types: double
NodeProbability
— Proportion of observations in original data that satisfy the conditions for the node
n
-element vector
This property is read-only.
Proportion of observations in original data that satisfy the conditions for each node in tree
, returned as an n
-element vector, where n
is the number of nodes in the tree.
Data Types: double
NodeRisk
— Risk of each node
n
-element vector
This property is read-only.
Risk of each node in tree
, returned as an n
-element vector, where n
is the number of nodes in the tree. The risk for each node is the node error weighted by the node probability.
Data Types: double
NodeSize
— Size of nodes
n
-element vector
This property is read-only.
Size of the nodes in tree
, returned as an n
-element vector, where n
is the number of nodes in the tree. The size of a node is the number of observations from the data used to create the tree that satisfy the conditions for the node.
Data Types: double
NumNodes
— Number of nodes
positive integer
This property is read-only.
The number of nodes in tree
, returned as a positive integer.
Data Types: double
NumObservations
— Number of observations in the training data
positive integer
This property is read-only.
Number of observations in the training data, returned as a positive integer.
NumObservations
can be less than the number of rows of input data
when there are missing values in the input data or response data.
Data Types: double
Parent
— Number of parents of nodes
n
-element vector
This property is read-only.
Number of parents of each node in tree
, returned as an n
-element integer vector, where n
is the number of nodes in the tree. The parent of the root node is 0
.
Data Types: double
PredictorNames
— Predictor names
cell array of character vectors
This property is read-only.
Predictor names, specified as a cell array of character vectors. The order of the
entries in PredictorNames
is the same as in the training data.
Data Types: cell
PruneAlpha
— Alpha values for pruning the tree
real vector
Alpha values for pruning the tree, returned as a real vector with one element per pruning level. If the pruning level ranges from 0 to M, then PruneAlpha
has M + 1 elements sorted in ascending order. PruneAlpha(1)
is for pruning level 0 (no pruning), PruneAlpha(2)
is for pruning level 1, and so on.
For the meaning of the ɑ values, see How Decision Trees Create a Pruning Sequence.
Data Types: double
PruneList
— Pruning levels of each node in tree
integer vector
Pruning levels of each node in the tree, returned as an integer vector with NumNodes
elements. The pruning levels range from 0 (no pruning) to M, where M is the distance between the deepest leaf and the root node.
For details, see Pruning.
Data Types: double
ResponseName
— Name of the response variable
character vector
This property is read-only.
Name of the response variable, returned as a character vector.
Data Types: char
ResponseTransform
— Function for transforming responses
'none'
(default) | function handle
Function for transforming the raw response values (mean squared error), specified as a function handle or 'none'
. The default 'none'
means no transformation; equivalently, 'none'
means @(x)x
. A function handle must accept a matrix of response values and return a matrix of the same size.
Add or change a ResponseTransform
function using dot notation:
tree.ResponseTransform = @function
Data Types: char
| function_handle
RowsUsed
— Rows of the original predictor data X
used for fitting
logical vector
This property is read-only.
Rows of the original predictor data X
used for fitting, returned as an
n
-element logical vector, where n
is the
number of rows of X
. If the software uses all rows of
X
for constructing the object, then RowsUsed
is an empty array ([]
).
Data Types: logical
SurrogateCutCategories
— Categories used for surrogate splits
n
-element cell array
This property is read-only.
Categories used for surrogate splits, returned as an n
-element cell
array, where n
is the number of nodes in tree
.
For each node k
, SurrogateCutCategories{k}
is a
cell array. The length of SurrogateCutCategories{k}
is equal to the
number of surrogate predictors found at this node. Every element of
SurrogateCutCategories{k}
is either an empty character vector for
a continuous surrogate predictor, or is a two-element cell array with categories for a
categorical surrogate predictor. The first element of this two-element cell array lists
categories assigned to the left child by this surrogate split and the second element of
this two-element cell array lists categories assigned to the right child by this
surrogate split. The order of the surrogate split variables at each node is matched to
the order of variables in SurrogateCutVar
. The optimal-split variable
at this node does not appear. For nonbranch (leaf) nodes,
SurrogateCutCategories
contains an empty cell.
Data Types: cell
SurrogateCutFlip
— Numeric cut assignments used for surrogate splits
n
-element cell array
This property is read-only.
Numeric cut assignments used for surrogate splits in tree
, returned as an n
-element cell array, where n
is the number of nodes in tree
. For each node k
, SurrogateCutFlip{k}
is a numeric vector. The length of SurrogateCutFlip{k}
is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutFlip{k}
is either zero for a categorical surrogate predictor, or a numeric cut assignment for a continuous surrogate predictor. The numeric cut assignment can be either –1 or +1. For every surrogate split with a numeric cut C based on a continuous predictor variable Z, the left child is chosen if Z<C and the cut assignment for this surrogate split is +1, or if Z≥C and the cut assignment for this surrogate split is –1. Similarly, the right child is chosen if Z≥C and the cut assignment for this surrogate split is +1, or if Z<C and the cut assignment for this surrogate split is –1. The order of the surrogate split variables at each node is matched to the order of variables in SurrogateCutPredictor
. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutFlip
contains an empty array.
Data Types: cell
SurrogateCutPoint
— Numeric values used for surrogate splits
n
-element cell array
This property is read-only.
Numeric values used for surrogate splits in tree
, returned as an
n
-element cell array, where n
is the number of
nodes in tree
. For each node k
,
SurrogateCutPoint{k}
is a numeric vector. The length of
SurrogateCutPoint{k}
is equal to the number of surrogate
predictors found at this node. Every element of SurrogateCutPoint{k}
is either NaN
for a categorical surrogate predictor, or a numeric cut
for a continuous surrogate predictor. For every surrogate split with a numeric cut
C based on a continuous predictor variable Z,
the left child is chosen if Z<C and SurrogateCutFlip
for this surrogate split is
+1, or if Z≥C and
SurrogateCutFlip
for this surrogate split is –1. Similarly, the
right child is chosen if Z≥C and SurrogateCutFlip
for this surrogate split is
+1, or if Z<C and SurrogateCutFlip
for this surrogate split is
–1. The order of the surrogate split variables at each node is matched to the order of
variables returned by SurrogateCutPredictor
. The optimal-split
variable at this node does not appear. For nonbranch (leaf) nodes,
SurrogateCutPoint
contains an empty cell.
Data Types: cell
SurrogateCutPredictor
— Names of variables used for surrogate splits in each node
n
-element cell array
This property is read-only.
Names of the variables used for surrogate splits in each node in
tree
, returned as an n
-element cell array,
where n
is the number of nodes in tree
. Every
element of SurrogateCutPredictor
is a cell array with the names of
the surrogate split variables at this node. The variables are sorted by the predictive
measure of association with the optimal predictor in the descending order, and only
variables with the positive predictive measure are included. The optimal-split variable
at this node does not appear. For nonbranch (leaf) nodes,
SurrogateCutPredictor
contains an empty cell.
Data Types: cell
SurrogateCutType
— Types of surrogate splits at each node
n
-element cell array
This property is read-only.
Types of surrogate splits at each node in tree
, returned as an
n
-element cell array, where n
is the number of
nodes in tree
. For each node k
,
SurrogateCutType{k}
is a cell array with the types of the
surrogate split variables at this node. The variables are sorted by the predictive
measure of association with the optimal predictor in the descending order, and only
variables with the positive predictive measure are included. The order of the surrogate
split variables at each node is matched to the order of variables in
SurrogateCutPredictor
. The optimal-split variable at this node
does not appear. For nonbranch (leaf) nodes, SurrogateCutType
contains an empty cell. A surrogate split type can be either
'continuous'
if the cut is defined in the form
Z
<V
for a variable Z
and
cut point V
or 'categorical'
if the cut is defined
by whether Z
takes a value in a set of categories.
Data Types: cell
SurrogatePredictorAssociation
— Predictive measures of association for surrogate splits
n
-element cell array
This property is read-only.
Predictive measures of association for surrogate splits in tree
, returned as an n
-element cell array, where n
is the number of nodes in tree
. For each node k
, SurrogatePredictorAssociation{k}
is a numeric vector. The length of SurrogatePredictorAssociation{k}
is equal to the number of surrogate predictors found at this node. Every element of SurrogatePredictorAssociation{k}
gives the predictive measure of association between the optimal split and this surrogate split. The order of the surrogate split variables at each node is the order of variables in SurrogateCutPredictor
. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogatePredictorAssociation
contains an empty cell.
Data Types: cell
W
— Scaled weights in tree
numeric vector
This property is read-only.
Scaled weights in tree
, returned as a numeric vector.
W
has length n
, the number of rows in the
training data.
Data Types: double
X
— Predictor values
real matrix | table
This property is read-only.
Predictor values, returned as a real matrix or table. Each column of
X
represents one variable (predictor), and each row represents
one observation.
Data Types: double
| table
Y
— Response data
numeric column vector
This property is read-only.
Response data, returned as a numeric column vector with the same number of
rows as X
. Each entry in Y
is the
response to the data in the corresponding row of
X
.
Data Types: double
Object Functions
compact | Reduce size of regression tree model |
crossval | Cross-validate machine learning model |
cvloss | Regression error by cross-validation for regression tree model |
gather | Gather properties of Statistics and Machine Learning Toolbox object from GPU |
lime | Local interpretable model-agnostic explanations (LIME) |
loss | Regression error for regression tree model |
nodeVariableRange | Retrieve variable range of decision tree node |
partialDependence | Compute partial dependence |
plotPartialDependence | Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots |
predict | Predict responses using regression tree model |
predictorImportance | Estimates of predictor importance for regression tree |
prune | Produce sequence of regression subtrees by pruning regression tree |
resubLoss | Resubstitution loss for regression tree model |
resubPredict | Predict response of regression tree by resubstitution |
shapley | Shapley values |
surrogateAssociation | Mean predictive measure of association for surrogate splits in regression tree |
view | View regression tree |
Examples
Construct Regression Tree
Load the sample data.
load carsmall
Construct a regression tree using the sample data. The response variable is miles per gallon, MPG.
tree = fitrtree([Weight, Cylinders],MPG,... 'CategoricalPredictors',2,'MinParentSize',20,... 'PredictorNames',{'W','C'})
tree = RegressionTree PredictorNames: {'W' 'C'} ResponseName: 'Y' CategoricalPredictors: 2 ResponseTransform: 'none' NumObservations: 94
Predict the mileage of 4,000-pound cars with 4, 6, and 8 cylinders.
MPG4Kpred = predict(tree,[4000 4; 4000 6; 4000 8])
MPG4Kpred = 3×1
19.2778
19.2778
14.3889
References
[1] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984.
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
Usage notes and limitations:
To integrate the prediction of a regression tree model into Simulink®, you can use the RegressionTree Predict block in the Statistics and Machine Learning Toolbox™ library or a MATLAB® Function block with the
predict
function.When you train a regression tree model by using
fitrtree
, the following restrictions apply.The value of the
ResponseTransform
name-value argument cannot be an anonymous function. For fixed-point code generation, the value must be'none'
(default).You cannot use surrogate splits; that is, the value of the
Surrogate
name-value argument must be'off'
.Fixed-point code generation and code generation with a coder configurer do not support categorical predictors (
logical
,categorical
,char
,string
, orcell
). You cannot use theCategoricalPredictors
name-value argument. To include categorical predictors in a model, preprocess them by usingdummyvar
before fitting the model.
For more information, see Introduction to Code Generation.
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
Usage notes and limitations:
The following object functions fully support GPU arrays:
The following object functions offer limited support for GPU arrays:
The object functions execute on a GPU if at least one of the following applies:
The model was fitted with GPU arrays.
The predictor data that you pass to the object function is a GPU array.
The response data that you pass to the object function is a GPU array.
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2011a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States.
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)