directforecaster
Description
DirectForecaster
is a multistep forecasting model that uses a
direct strategy in which a separate regression model is trained for each step of the
forecasting horizon. For more information, see Direct Forecasting. Use the directforecaster
function to train a
DirectForecaster
model with regularly sampled time series
data.
You can use lagged and leading predictors to train the direct forecasting model.
directforecaster
creates the appropriate predictors when you specify
the following:
Leading exogenous predictors (
LeadingPredictors
)Lag values of the leading exogenous predictors (
LeadingPredictorLags
)Lag values of the nonleading exogenous predictors (
PredictorLags
)Lag values of the response (
ResponseLags
)
For more information, see Forecasting Data.
After creating a DirectForecaster
object, you can see how the model
performs on observed test data by using the loss
and predict
object
functions. You can then use the model to forecast at time steps beyond the available data by
using the forecast
object
function.
Creation
Syntax
Description
creates a direct forecasting model Mdl
= directforecaster(Tbl
,ResponseVarName
)Mdl
using the regularly sampled
data in Tbl
and the response in variable
ResponseVarName
in Tbl
. The function treats
all variables in Tbl
other than ResponseVarName
as exogenous predictor variables.
By default, the resulting Mdl
object contains one regression
model, with a time horizon of one step ahead. directforecaster
uses a
lag value of 1
to create predictors from the exogenous predictors and
the response variable.
specifies options using one or more name-value arguments in addition to any of the input
argument combinations in previous syntaxes. For example, you can create a model that
forecasts at the first, third, and fifth horizon steps by specifying Mdl
= directforecaster(__,Name=Value
)Horizon=[1
3 5]
.
Input Arguments
Tbl
— Training set data
table | timetable
Training set data, specified as a table or timetable. Each row of
Tbl
corresponds to one observation, and each column corresponds
to one variable. Tbl
must contain the response variable
ResponseVarName
.
The software assumes that the observations in
Tbl
are regularly sampled. Ensure that no time steps are missing or duplicated and that the observations are in ascending order.By default, the software treats all variables in
Tbl
other thanResponseVarName
as exogenous predictors. To use a subset of the variables inTbl
as exogenous predictors during model training, specify thePredictorNames
name-value argument.
ResponseVarName
— Response variable name
name of variable in Tbl
Response variable name, specified as the name of a variable in
Tbl
. The response variable must contain numeric values.
You must specify ResponseVarName
as a character vector or
string scalar. For example, if Tbl
stores the response variable
Response
as Tbl.Response
, then specify it as
"Response"
.
Data Types: char
| string
X
— Training set exogenous predictor data
numeric matrix | table | timetable
Training set exogenous predictor data, specified as a numeric matrix, table, or
timetable. Each row of X
corresponds to one observation, and each
column corresponds to one predictor.
The software assumes that the observations in
X
are regularly sampled. Ensure that no time steps are missing or duplicated and that the observations are in ascending order.X
andY
must have the same number of observations.If
X
is a matrix, you can specify the names of the predictors in the order of their appearance inX
by using thePredictorNames
name-value argument.If
X
is a table or timetable, you can use a subset of the variables inTbl
as exogenous predictors during model training by specifying thePredictorNames
name-value argument.
Y
— Training set response data
numeric vector | one-column table | one-column timetable
Training set response data, specified as a numeric vector, one-column table, or
one-column timetable. Each row of Y
corresponds to one observation.
If
X
is a numeric matrix, thenY
must be a numeric vector.If
X
is a table, thenY
must be a numeric vector or one-column table.If
X
is a timetable or it is not specified, thenY
must be a numeric vector, one-column table, or one-column timetable.
If you specify both X
and Y
,
then they must have the same number of observations.
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: directforecaster(Tbl,"Y",Horizon=1:3,LeadingPredictors="all",LeadingPredictorLags=0:1,ResponseLags=1:2)
specifies to forecast at the first, second, and third horizon steps using lagged and
leading predictors. The software treats all exogenous predictors as leading predictors,
and creates one new lagged feature from each exogenous predictor in Tbl
and two new lagged features from the response variable Y
in
Tbl
. The leading predictor lag value of 0
specifies to also use the unshifted exogenous predictors.
Horizon
— Future time steps at which to forecast
1
(default) | positive integer vector
Future time steps at which to forecast, specified as a positive integer vector.
The software uses each specified value in Horizon
as a
individual horizon step, and trains a regression model that forecasts at that
horizon step.
By default, the software trains one regression model that forecasts one step ahead.
Example: Horizon=1:5
Example: Horizon=[2 4 6]
Data Types: single
| double
Learner
— Type of regression model to train at each horizon step
"bag"
(default) | "gam"
| "gp"
| "kernel"
| "linear"
| "lsboost"
| "svm"
| "tree"
| template object
Type of regression model to train at each horizon step, specified as one of the values in this table.
Value | Regression Model Type |
---|---|
"bag" or templateEnsemble template (with
the method specified as "Bag" and the weak learners
specified as "Tree" ) | Bagged ensemble of trees |
"gam" or templateGAM template | General additive model (GAM) |
"gp" or templateGP template | Gaussian process regression (GPR) |
"kernel" or templateKernel template | Kernel model |
"linear" or templateLinear template | Linear model |
"lsboost" or templateEnsemble template (with
the method specified as "LSBoost" and the weak learners
specified as "Tree" ) | Boosted ensemble of trees |
"svm" or templateSVM template | Support vector machine (SVM) |
"tree" or templateTree template | Decision tree |
Example: Learner="svm"
Example: Learner=templateEnsemble("LSBoost",50,"Tree")
LeadingPredictors
— List of exogenous predictors whose future values are known
[]
(default) | positive integer vector | logical vector | string array | cell array of character vectors | "all"
List of exogenous predictors whose future values are known, specified as one of the values in this table.
Value | Description |
---|---|
Positive integer vector | Each entry in the vector is an index value indicating that the
corresponding exogenous predictor is leading. The index values are
between 1 and p, where p is the
number of exogenous predictors listed in |
Logical vector | A |
String array or cell array of character vectors | Each element in the array is the name of a leading exogenous
predictor variable. The names must match the entries in
PredictorNames . |
"all" | All exogenous predictors are leading. |
Note
This name-value argument is valid only when you use exogenous predictors.
Example: LeadingPredictors="all"
Data Types: single
| double
| logical
| string
| cell
LeadingPredictorLags
— Predictor lags for preparing leading exogenous predictors
0
(default) | nonnegative integer vector | cell array of nonnegative integer vectors
Predictor lags for preparing leading exogenous predictors, specified as a nonnegative integer vector or a 1-by-l cell array of nonnegative integer vectors, where l is the number of leading exogenous predictors.
If
LeadingPredictorLags
is a vector, then the software applies each specified lag value inLeadingPredictorLags
to all the leading exogenous predictors. That is, for each elementi
in the vector, the software shifts the leading exogenous predictors backward in time byi
steps, relative to the horizon time step. The software uses the resulting features as predictors.If
LeadingPredictorLags
is a cell array, then the numeric values in elementi
of the cell array indicate the lags for leading exogenous predictori
.
Note
This name-value argument is valid only when you use leading exogenous
predictors by specifying the LeadingPredictors
name-value argument.
Example: LeadingPredictorLags=[0 2 4]
Example: LeadingPredictorLags={0:1,0:2}
Data Types: single
| double
| cell
PredictorLags
— Predictor lags used for preparing nonleading exogenous predictors
1
(default) | positive integer vector | cell array of positive integer vectors
Predictor lags used for preparing nonleading exogenous predictors, specified as a positive integer vector or a 1-by-q cell array of positive integer vectors, where q is the number of nonleading exogenous predictors.
If
PredictorLags
is a vector, then the software applies each specified lag value inPredictorLags
to all the nonleading exogenous predictors. That is, for each elementi
in the vector, the software shifts the nonleading exogenous predictors backward in time byi
steps and uses the resulting feature as a predictor.If
PredictorLags
is a cell array, then the numeric values in elementi
of the cell array indicate the lags for nonleading exogenous predictori
.
Note
This name-value argument is valid only when you use nonleading exogenous predictors.
Example: PredictorLags=1:14
Example: PredictorLags={1:2,1:3,1:2}
Data Types: single
| double
| cell
ResponseLags
— Response lags used for preparing predictors
1
(default) | positive integer vector | []
Response lags used for preparing predictors, specified as a positive integer
vector. The software applies each specified lag value in
ResponseLags
to the response. That is, for each element
i
in the vector, the software shifts the response backward in
time by i
steps and uses the resulting feature as a predictor. To
create no lagged response variables, specify ResponseLags
as
[]
.
Example: ResponseLags=1:7
Data Types: single
| double
CategoricalPredictors
— List of categorical exogenous predictors
positive integer vector | logical vector | string array | cell array of character vectors | "all"
List of categorical exogenous predictors, specified as one of the values in this table.
Value | Description |
---|---|
Positive integer vector | Each entry in the vector is an index value indicating that the
corresponding exogenous predictor is categorical. The index values are
between 1 and p, where p is the
number of exogenous predictors listed in |
Logical vector | A |
String array or cell array of character vectors | Each element in the array is the name of a categorical exogenous
predictor variable. The names must match the entries in
PredictorNames . |
"all" | All exogenous predictors are categorical. |
By default, if the exogenous predictors are in a numeric matrix, the software
assumes all the exogenous predictors are continuous. If the exogenous predictors are
in a table or timetable, the software assumes they are categorical if they are
logical vectors, categorical
vectors, character arrays, string
arrays, or cell arrays of character vectors. However, learners that use decision
trees assume that mathematically ordered categorical
vectors are
continuous variables. To identify any other predictors as categorical predictors,
specify them by using the CategoricalPredictors
name-value
argument.
The software creates dummy variables based on the Learner
name-value argument and the underlying fitting function used to create the
regression models in Learners
. For more information on how fitting functions treat
categorical predictors, see Automatic Creation of Dummy Variables.
Note
This name-value argument is valid only when you use exogenous predictors.
Example: CategoricalPredictors="all"
Data Types: single
| double
| logical
| string
| cell
PredictorNames
— Names of exogenous predictor variables
string array | cell array of character vectors
Names of the exogenous predictor variables, specified as a string array or cell array of character vectors.
If you supply exogenous predictor data using a numeric matrix, then you can use
PredictorNames
to assign names to the exogenous predictor variables.The order of the names in
PredictorNames
must correspond to the order of the columns in the matrix.By default,
PredictorNames
is{'x1','x2',...}
.
If you supply exogenous predictor data using a table or timetable, then you can use
PredictorNames
to specify which exogenous variables to use as predictors during training.PredictorNames
must be a subset of the variable names in the table or timetable and cannot include the name of the response variable.By default,
PredictorNames
contains the names of all variables other than the response variable.
Note
This name-value argument is valid only when you use exogenous predictors.
Example: PredictorNames=["Day","Month","Year"]
Data Types: string
| cell
ResponseName
— Name of response variable
"Y"
(default) | character vector | string scalar
Name of the response variable Y
, specified as a character
vector or a string scalar. ResponseName
cannot be the name of a
variable in X
.
Note
This name-value argument is valid only when you supply Y
as a numeric vector.
Example: ResponseName="Temperature"
Data Types: char
| string
Partition
— Time series data partition for cross-validating model
[]
(default) | tspartition
object
Time series data partition for cross-validating the model, specified as a
tspartition
object. The tspartition
object can use one of the following
validation schemes: expanding window cross-validation, sliding window
cross-validation, or holdout validation.
If you specify the Partition
name-value argument, then
directforecaster
returns a PartitionedDirectForecaster
object. Otherwise, the function returns a
DirectForecaster
object.
Example: Partition=tspartition(size(X,1),"ExpandingWindow",5)
UseParallel
— Flag to run computations in parallel
false
(default) | true
Flag to run computations in parallel, specified as true
or
false
. If you specify UseParallel
as
true
, then the function executes for
-loop
iterations by using parfor
(Parallel Computing Toolbox). The loop runs in parallel
when you have Parallel Computing Toolbox™.
Example: UseParallel=true
Data Types: logical
NumBins
— Number of bins for numeric predictors
[]
(default) | positive integer scalar
Number of bins for the numeric predictors, specified as a positive integer scalar.
If the
NumBins
value is empty (default), thendirectforecaster
does not bin any predictors.If you specify the
NumBins
value as a positive integer scalar (numBins
), thendirectforecaster
bins every numeric predictor into at mostnumBins
equiprobable bins, and then grows trees on the bin indices instead of the original data.The number of bins can be less than
numBins
if a predictor has fewer thannumBins
unique values.directforecaster
does not bin categorical predictors.
When you use a large training data set, this binning option speeds up training
but might cause a decrease in accuracy. You can try setting the
NumBins
value to 50
first, and then change
the value depending on the accuracy and training speed.
Note
directforecaster
supports the
NumBins
name-value argument for trees and ensembles of
trees only. That is, the Learner
value must be
"tree"
, "bag"
, "gam"
,
"lsboost"
, or a template object created by
templateTree
, templateGAM
, or
templateEnsemble
.
Example: NumBins=50
Data Types: single
| double
Output Arguments
Mdl
— Trained direct forecasting model
DirectForecaster
model object | PartitionedDirectForecaster
model object
Trained direct forecasting model, returned as a DirectForecaster
or PartitionedDirectForecaster
model object.
If you specify the Partition
name-value argument, then directforecaster
returns a
PartitionedDirectForecaster
model object. Otherwise, the function
returns a DirectForecaster
model object.
Properties
Data Properties
CategoricalPredictors
— Indices of categorical exogenous predictors
positive integer vector | []
This property is read-only.
Indices of categorical exogenous predictors, specified as a positive integer vector.
Each index value in CategoricalPredictors
indicates that the
corresponding exogenous predictor listed in PredictorNames
is
categorical. If none of the exogenous predictors are categorical, then this property is
empty ([]
).
Data Types: double
NumObservations
— Number of observations
positive integer scalar
This property is read-only.
Number of observations in the data stored in X
and
Y
, specified as a positive integer scalar.
Data Types: double
PredictorNames
— Names of exogenous predictors
cell array of character vectors
This property is read-only.
Names of the exogenous predictors, specified as a cell array of character vectors. The
order of the elements in PredictorNames
corresponds to the order of
the exogenous predictors in the data argument used to train the model.
Data Types: cell
ResponseName
— Name of response variable
character vector
This property is read-only.
Name of the response variable, specified as a character vector.
Data Types: char
X
— Exogenous predictor data
numeric matrix | table | timetable
This property is read-only.
Exogenous predictor data used to train the model, specified as a numeric matrix,
table, or timetable. Each row of X
corresponds to one
observation, and each column corresponds to one variable.
Y
— Observed response data
numeric vector | one-column table | one-column timetable
This property is read-only.
Observed response data used to train the model, specified as a numeric vector,
one-column table, or one-column timetable. Each row of Y
corresponds to one observation.
Forecasting Properties
Horizon
— Future time steps at which to forecast
positive integer vector
This property is read-only.
Future time steps at which to forecast, specified as a positive integer vector.
Learners
contains a trained regression model for each horizon
step. For example, if the Horizon
value of a direct forecasting
model Mdl
is [1 3]
, then
Mdl.Learners
contains two regression models: one that forecasts
at horizon step 1
, and one that forecasts at horizon step
3
.
Data Types: double
LeadingPredictorLags
— Predictor lags used for preparing leading exogenous predictors
nonnegative integer vector | cell array of nonnegative integer vectors | []
This property is read-only.
Leading predictor lags used for preparing leading exogenous predictors, specified as a nonnegative integer vector or cell array of nonnegative integer vectors.
If
LeadingPredictorLags
is a vector, then for each elementi
in the vector, the software shifts the leading exogenous predictors backward in time byi
steps, relative to the horizon time step. The software uses the resulting features as predictors. When theLeadingPredictorLags
value is0
, the software uses the unshifted leading predictors.For example, if the
Horizon
value of a direct forecasting model is3
and theLeadingPredictorLags
value is0
, then the software uses the unshifted leading predictor values at horizon step3
as predictor values.If
LeadingPredictorLags
is a cell array, then the numeric values in elementi
of the cell array indicate the lags for leading exogenous predictori
.
If no leading predictor lags are used, then this property is empty ([]
).
Data Types: double
| cell
LeadingPredictors
— Indices of leading exogenous predictors
positive integer vector | []
This property is read-only.
Indices of the leading exogenous predictors, specified as a positive integer vector. Leading predictors are predictors for which future values are known. Each index value in LeadingPredictors
indicates that the corresponding exogenous predictor listed in PredictorNames
is leading. If no exogenous predictors are leading predictors, then this property is empty ([]
).
Data Types: double
Learners
— Compact regression models trained at different horizon steps
cell array of regression model objects
This property is read-only.
Compact regression models trained at different horizon steps, specified as a cell array of regression model objects. That is, for a direct forecasting model Mdl
, the software trains the regression model Mdl.Learners{1}
at horizon step Mdl.Horizon(1)
.
This table lists the possible compact regression models.
Regression Model Type | Model Object |
---|---|
Bagged or boosted ensemble of trees | CompactRegressionEnsemble |
General additive model (GAM) | CompactRegressionGAM |
Gaussian process regression (GPR) | CompactRegressionGP |
Kernel model | RegressionKernel |
Linear model | RegressionLinear |
Support vector machine (SVM) | CompactRegressionSVM |
Decision tree | CompactRegressionTree |
Data Types: cell
LearnerTemplate
— Template for regression models
output of template function
This property is read-only.
Template for the regression models in Learners
, specified as
the output of one of these template functions.
Template Function | Description |
---|---|
templateEnsemble | Ensemble learning template, with the ensemble aggregation method specified as
"Bag" or "LSBoost" |
templateGAM | General additive model template |
templateGP | Gaussian process regression model template |
templateKernel | Kernel model template |
templateLinear | Linear learner template |
templateSVM | Support vector machine template |
templateTree | Decision tree template |
MaxLag
— Maximum lag value
nonnegative integer scalar
This property is read-only.
Maximum lag value, specified as a nonnegative integer scalar. The MaxLag
value depends on the values in ResponseLags
,
PredictorLags
, and LeadingPredictorLags
.
Specifically, the software computes the maximum lag as
follows:
MaxLag = max([0,ResponseLags,PredictorLags, ...
LeadingPredictorLags - min(Horizon) + 1])
Data Types: double
PredictorLags
— Predictor lags used for preparing nonleading exogenous predictors
positive integer vector | cell array of positive integer vectors | []
This property is read-only.
Predictor lags used for preparing nonleading exogenous predictors, specified as a positive integer vector or cell array of positive integer vectors.
If
PredictorLags
is a vector, then for each elementi
in the vector, the software shifts the nonleading exogenous predictors backward in time byi
steps and uses the resulting features as predictors.If
PredictorLags
is a cell array, then the numeric values in elementi
of the cell array indicate the lags for nonleading exogenous predictori
.
If no predictor lags are used, then this property is empty ([]
).
Data Types: double
| cell
ResponseLags
— Response lags used for preparing predictors
positive integer vector | []
This property is read-only.
Response lags used for preparing predictors, specified as a positive integer vector.
Each element in ResponseLags
indicates the number of time steps by
which to shift the response backward in time. The resulting feature is used as a
predictor. If no response lags are used, then this property is empty
([]
).
Data Types: double
Prepared Data Properties
PreparedCategoricalPredictors
— Indices of prepared categorical predictors
positive integer vector | []
This property is read-only.
Indices of the prepared categorical predictors, specified as a positive integer vector. Each index value in PreparedCategoricalPredictors
indicates that the corresponding predictor listed in PreparedPredictorNames
is categorical. If no prepared predictors are categorical predictors, then this property is empty ([]
).
Data Types: double
PreparedPredictorNames
— Names of prepared predictors
cell array of character vectors
This property is read-only.
Names of the prepared predictors, specified as a cell array of character vectors. These
prepared predictors include variables created from both the exogenous predictor
variables and the response variable used to train the direct forecasting model. Not
every predictor is used at every horizon step. To see which predictors are used at a
specific horizon step, consult the PreparedPredictorsPerHorizon
table.
Data Types: cell
PreparedPredictorsPerHorizon
— Prepared predictors at each horizon step
table of logical values
This property is read-only.
Prepared predictors at each horizon step, specified as a table of logical values. Each row of the table corresponds to a horizon step, and each column of the table corresponds to a prepared predictor as listed in PreparedPredictorNames
.
For a direct forecasting model Mdl
, the logical value in row i
and column j
indicates whether the software uses prepared predictor Mdl.PreparedPredictorNames(j)
at horizon step Mdl.Horizon(i)
. If the value is 1
(true
), then the software uses the predictor. If the value is 0
(false
), then the software does not use the predictor.
Data Types: table
PreparedResponseNames
— Names of prepared responses at each horizon step
cell array of character vectors
This property is read-only.
Names of the prepared responses at each horizon step, specified as a cell array of character
vectors. That is, element i
of
PreparedResponseNames
is the name of the response variable at
the horizon step specified by element i
of
Horizon
.
For example, given a direct forecasting model Mdl
, the name of the response
variable at horizon step Mdl.Horizon(1)
,
Mdl.PreparedResponseNames{1}
, matches the response variable name
used in the first regression model in Learners
(Mdl.Learners{1}.ResponseName
).
Data Types: cell
Object Functions
compact | Reduce size of direct forecasting model |
crossval | Cross-validate direct forecasting model |
loss | Loss at each horizon step |
predict | Predict response at time steps in observed test data |
forecast | Forecast response at time steps beyond available data |
preparedPredictors | Obtain prepared data used for training or testing in direct forecasting |
Examples
Calculate Test Set Mean Squared Error of Direct Forecasting Model
Calculate the test set mean squared error (MSE) of a direct forecasting model.
Load the sample file TemperatureData.csv
, which contains average daily temperatures from January 2015 through July 2016. Read the file into a table. Observe the first eight observations in the table.
temperatures = readtable("TemperatureData.csv");
head(temperatures)
Year Month Day TemperatureF ____ ___________ ___ ____________ 2015 {'January'} 1 23 2015 {'January'} 2 31 2015 {'January'} 3 25 2015 {'January'} 4 39 2015 {'January'} 5 29 2015 {'January'} 6 12 2015 {'January'} 7 10 2015 {'January'} 8 4
For this example, use a subset of the temperature data that omits the first 100 observations.
Tbl = temperatures(101:end,:);
Create a datetime
variable t
that contains the year, month, and day information for each observation in Tbl
. Then, use t
to convert Tbl
into a timetable.
numericMonth = month(datetime(Tbl.Month, ... InputFormat="MMMM",Locale="en_US")); t = datetime(Tbl.Year,numericMonth,Tbl.Day); Tbl.Time = t; Tbl = table2timetable(Tbl);
Plot the temperature values in Tbl
over time.
plot(Tbl.Time,Tbl.TemperatureF) xlabel("Date") ylabel("Temperature in Fahrenheit")
Partition the temperature data into training and test sets by using tspartition
. Reserve 20% of the observations for testing.
partition = tspartition(size(Tbl,1),"Holdout",0.20);
trainingTbl = Tbl(training(partition),:);
testTbl = Tbl(test(partition),:);
Create a full direct forecasting model by using the data in trainingTbl
. Train the model using a decision tree learner. All three of the predictors (Year
, Month
, and Day
) are leading predictors because their future values are known. To create new predictors by shifting the leading predictor and response variables backward in time, specify the leading predictor lags and the response variable lags.
Mdl = directforecaster(trainingTbl,"TemperatureF", ... Learner="tree", ... LeadingPredictors="all",LeadingPredictorLags={0:1,0:1,0:7}, ... ResponseLags=1:7)
Mdl = DirectForecaster Horizon: 1 ResponseLags: [1 2 3 4 5 6 7] LeadingPredictors: [1 2 3] LeadingPredictorLags: {[0 1] [0 1] [0 1 2 3 4 5 6 7]} ResponseName: 'TemperatureF' PredictorNames: {'Year' 'Month' 'Day'} CategoricalPredictors: 2 Learners: {[1x1 classreg.learning.regr.CompactRegressionTree]} MaxLag: 7 NumObservations: 372
Mdl
is a DirectForecaster
model object. By default, the horizon is one step ahead. That is, Mdl
predicts a value that is one step into the future.
Calculate the test set MSE. Smaller MSE values indicate better performance.
testMSE = loss(Mdl,testTbl)
testMSE = 61.0849
Predict Response for Observed Test Data and Forecast Response Beyond Available Data
After creating a DirectForecaster
object, see how the model performs on observed test data by using the predict
object function. Then use the model to forecast at time steps beyond the available data by using the forecast
object function.
Load the sample file TemperatureData.csv
, which contains average daily temperatures from January 2015 through July 2016. Read the file into a table. Observe the first eight observations in the table.
temperatures = readtable("TemperatureData.csv");
head(temperatures)
Year Month Day TemperatureF ____ ___________ ___ ____________ 2015 {'January'} 1 23 2015 {'January'} 2 31 2015 {'January'} 3 25 2015 {'January'} 4 39 2015 {'January'} 5 29 2015 {'January'} 6 12 2015 {'January'} 7 10 2015 {'January'} 8 4
For this example, use a subset of the temperature data that omits the first 100 observations.
Tbl = temperatures(101:end,:);
Create a datetime
variable t
that contains the year, month, and day information for each observation in Tbl
. Then, use t
to convert Tbl
into a timetable.
numericMonth = month(datetime(Tbl.Month, ... InputFormat="MMMM",Locale="en_US")); t = datetime(Tbl.Year,numericMonth,Tbl.Day); Tbl.Time = t; Tbl = table2timetable(Tbl);
Plot the temperature values in Tbl
over time.
plot(Tbl.Time,Tbl.TemperatureF) xlabel("Date") ylabel("Temperature in Fahrenheit")
Partition the temperature data into training and test sets by using tspartition
. Reserve 20% of the observations for testing.
partition = tspartition(size(Tbl,1),"Holdout",0.20);
trainingTbl = Tbl(training(partition),:);
testTbl = Tbl(test(partition),:);
Create a full direct forecasting model by using the data in trainingTbl
. Train the model using a decision tree learner. All three of the predictors (Year
, Month
, and Day
) are leading predictors because their future values are known. To create new predictors by shifting the leading predictor and response variables backward in time, specify the leading predictor lags and the response variable lags.
Mdl = directforecaster(trainingTbl,"TemperatureF", ... Learner="tree", ... LeadingPredictors="all",LeadingPredictorLags={0:1,0:1,0:7}, ... ResponseLags=1:7)
Mdl = DirectForecaster Horizon: 1 ResponseLags: [1 2 3 4 5 6 7] LeadingPredictors: [1 2 3] LeadingPredictorLags: {[0 1] [0 1] [0 1 2 3 4 5 6 7]} ResponseName: 'TemperatureF' PredictorNames: {'Year' 'Month' 'Day'} CategoricalPredictors: 2 Learners: {[1x1 classreg.learning.regr.CompactRegressionTree]} MaxLag: 7 NumObservations: 372
Mdl
is a DirectForecaster
model object. By default, the horizon is one step ahead. That is, Mdl
predicts a value that is one step into the future.
For each test set observation, predict the temperature value using Mdl
.
predictedY = predict(Mdl,testTbl)
predictedY=93×1 timetable
Time TemperatureF_Step1
___________ __________________
16-Apr-2016 49.398
17-Apr-2016 39.419
18-Apr-2016 39.419
19-Apr-2016 45.333
20-Apr-2016 35.867
21-Apr-2016 34.222
22-Apr-2016 45.333
23-Apr-2016 66.392
24-Apr-2016 44.111
25-Apr-2016 49
26-Apr-2016 49
27-Apr-2016 34.222
28-Apr-2016 43.333
29-Apr-2016 34.222
30-Apr-2016 34.222
01-May-2016 34.222
⋮
Plot the true response values and the predicted response values for the test set observations.
plot(testTbl.Time,testTbl.TemperatureF) hold on plot(predictedY.Time,predictedY.TemperatureF_Step1,"--") hold off legend("True","Predicted",Location="southeast") xlabel("Date") ylabel("Temperature in Fahrenheit")
Overall, the direct forecasting model is able to predict the trend in temperatures.
Retrain the direct forecasting model using the training and test data. To forecast temperatures one week beyond the available data, specify the horizon steps as one to seven steps ahead.
finalMdl = directforecaster(Tbl,"TemperatureF", ... Learner="tree", ... LeadingPredictors="all",LeadingPredictorLags={0:1,0:1,0:7}, ... ResponseLags=1:7,Horizon=1:7)
finalMdl = DirectForecaster Horizon: [1 2 3 4 5 6 7] ResponseLags: [1 2 3 4 5 6 7] LeadingPredictors: [1 2 3] LeadingPredictorLags: {[0 1] [0 1] [0 1 2 3 4 5 6 7]} ResponseName: 'TemperatureF' PredictorNames: {'Year' 'Month' 'Day'} CategoricalPredictors: 2 Learners: {7x1 cell} MaxLag: 7 NumObservations: 465
finalMdl
is a DirectForecaster
model object that consists of seven regression models: finalMdl.Learners{1}
, which predicts one step into the future; finalMdl.Learners{2}
, which predicts two steps into the future; and so on.
Because finalMdl
uses the unshifted values of the leading predictors Year
, Month
, and Day
as predictor values, you must specify these values for the specified horizon steps in the call to forecast
. For the week after the last available observation in Tbl
, create a timetable forecastData
with the year, month, and day values.
forecastTime = Tbl.Time(end,:)+1:Tbl.Time(end,:)+7; forecastYear = year(forecastTime); forecastMonth = month(forecastTime,"name"); forecastDay = day(forecastTime); forecastData = timetable(forecastTime',forecastYear', ... forecastMonth',forecastDay',VariableNames=["Year","Month","Day"])
forecastData=7×3 timetable
Time Year Month Day
___________ ____ ________ ___
18-Jul-2016 2016 {'July'} 18
19-Jul-2016 2016 {'July'} 19
20-Jul-2016 2016 {'July'} 20
21-Jul-2016 2016 {'July'} 21
22-Jul-2016 2016 {'July'} 22
23-Jul-2016 2016 {'July'} 23
24-Jul-2016 2016 {'July'} 24
Forecast the temperature at each horizon step using finalMdl
.
forecastY = forecast(finalMdl,Tbl,LeadingData=forecastData)
forecastY=7×1 timetable
Time TemperatureF
___________ ____________
18-Jul-2016 62.375
19-Jul-2016 64.5
20-Jul-2016 66.889
21-Jul-2016 66.889
22-Jul-2016 70.5
23-Jul-2016 74.25
24-Jul-2016 74.25
Plot the observed temperatures for the test set data and the forecast temperatures.
plot(testTbl.Time,testTbl.TemperatureF) hold on plot([testTbl.Time(end);forecastY.Time], ... [testTbl.TemperatureF(end);forecastY.TemperatureF],"--") hold off legend("Observed Data","Forecast Data", ... Location="southeast") xlabel("Date") ylabel("Temperature in Fahrenheit")
Prepared Predictor Data for Forecasting
When you perform direct forecasting using directforecaster
, the function creates lagged and leading predictors from the training data before fitting a DirectForecaster
model. Similarly, the loss
and predict
object functions reformat the test data before computing loss and prediction values, respectively.
This example shows how to access the prepared predictor data used by direct forecasting models for training and testing.
Load the sample file TemperatureData.csv
, which contains average daily temperatures from January 2015 through July 2016. Read the file into a table. Observe the first eight observations in the table.
temperatures = readtable("TemperatureData.csv");
head(temperatures)
Year Month Day TemperatureF ____ ___________ ___ ____________ 2015 {'January'} 1 23 2015 {'January'} 2 31 2015 {'January'} 3 25 2015 {'January'} 4 39 2015 {'January'} 5 29 2015 {'January'} 6 12 2015 {'January'} 7 10 2015 {'January'} 8 4
For this example, use a subset of the temperature data that omits the first 100 observations.
Tbl = temperatures(101:end,:);
Create a datetime
variable t
that contains the year, month, and day information for each observation in Tbl
. Then, use t
to convert Tbl
into a timetable.
numericMonth = month(datetime(Tbl.Month, ... InputFormat="MMMM",Locale="en_US")); t = datetime(Tbl.Year,numericMonth,Tbl.Day); Tbl.Time = t; Tbl = table2timetable(Tbl);
Plot the temperature values in Tbl
over time.
plot(Tbl.Time,Tbl.TemperatureF) xlabel("Date") ylabel("Temperature in Fahrenheit")
Partition the temperature data into training and test sets by using tspartition
. Reserve 20% of the observations for testing.
partition = tspartition(size(Tbl,1),"Holdout",0.20);
trainingTbl = Tbl(training(partition),:);
testTbl = Tbl(test(partition),:);
Create a full direct forecasting model by using the data in trainingTbl
. Specify the horizon steps as one to seven steps ahead. Train a model at each horizon step using a boosted ensemble of trees. All three of the predictors (Year
, Month
, and Day
) are leading predictors because their future values are known.
To create new predictors by shifting the leading predictor and response variables backward in time, specify the leading predictor lags and the response variable lags. For this example, use the following as predictors values: the current and previous Year
values, the current and previous Month
values, the current and previous seven Day
values, and the previous seven TemperatureF
values.
Mdl = directforecaster(trainingTbl,"TemperatureF", ... Horizon=1:7,LeadingPredictors="all", ... LeadingPredictorLags={0:1,0:1,0:7}, ... ResponseLags=1:7)
Mdl = DirectForecaster Horizon: [1 2 3 4 5 6 7] ResponseLags: [1 2 3 4 5 6 7] LeadingPredictors: [1 2 3] LeadingPredictorLags: {[0 1] [0 1] [0 1 2 3 4 5 6 7]} ResponseName: 'TemperatureF' PredictorNames: {'Year' 'Month' 'Day'} CategoricalPredictors: 2 Learners: {7x1 cell} MaxLag: 7 NumObservations: 372
Mdl
is a DirectForecaster
model object. Mdl
consists of seven regression models: Mdl.Learners{1}
, which predicts one step into the future; Mdl.Learners{2}
, which predicts two steps into the future; and so on.
Compare the first and seventh regression models in Mdl
.
Mdl.Learners{1}
ans = CompactRegressionEnsemble PredictorNames: {1x19 cell} ResponseName: 'TemperatureF_Step1' CategoricalPredictors: [10 11] ResponseTransform: 'none' NumTrained: 100
Mdl.Learners{7}
ans = CompactRegressionEnsemble PredictorNames: {1x19 cell} ResponseName: 'TemperatureF_Step7' CategoricalPredictors: [10 11] ResponseTransform: 'none' NumTrained: 100
The regression models in Mdl
are all CompactRegressionEnsemble
objects. Because the models are compact, they do not include the predictor data used to train them.
To see the data used to train the regression models in Mdl
, use the preparedPredictors
object function.
Observe the prepared predictor data used to train Mdl.Learners{1}
. By default, preparedPredictors
returns the prepared predictor data used at horizon step Mdl.Horizon(1)
, which in this case is one step ahead.
prepTrainingTbl1 = preparedPredictors(Mdl,trainingTbl)
prepTrainingTbl1=372×19 timetable
Time TemperatureF_Lag1 TemperatureF_Lag2 TemperatureF_Lag3 TemperatureF_Lag4 TemperatureF_Lag5 TemperatureF_Lag6 TemperatureF_Lag7 Year_Step1 Year_Lag1 Month_Step1 Month_Lag1 Day_Step1 Day_Lag1 Day_Lag2 Day_Lag3 Day_Lag4 Day_Lag5 Day_Lag6 Day_Lag7
___________ _________________ _________________ _________________ _________________ _________________ _________________ _________________ __________ _________ ___________ __________ _________ ________ ________ ________ ________ ________ ________ ________
10-Apr-2015 NaN NaN NaN NaN NaN NaN NaN 2015 NaN {'April'} {0x0 char} 10 NaN NaN NaN NaN NaN NaN NaN
11-Apr-2015 41 NaN NaN NaN NaN NaN NaN 2015 2015 {'April'} {'April' } 11 10 NaN NaN NaN NaN NaN NaN
12-Apr-2015 45 41 NaN NaN NaN NaN NaN 2015 2015 {'April'} {'April' } 12 11 10 NaN NaN NaN NaN NaN
13-Apr-2015 49 45 41 NaN NaN NaN NaN 2015 2015 {'April'} {'April' } 13 12 11 10 NaN NaN NaN NaN
14-Apr-2015 50 49 45 41 NaN NaN NaN 2015 2015 {'April'} {'April' } 14 13 12 11 10 NaN NaN NaN
15-Apr-2015 54 50 49 45 41 NaN NaN 2015 2015 {'April'} {'April' } 15 14 13 12 11 10 NaN NaN
16-Apr-2015 54 54 50 49 45 41 NaN 2015 2015 {'April'} {'April' } 16 15 14 13 12 11 10 NaN
17-Apr-2015 46 54 54 50 49 45 41 2015 2015 {'April'} {'April' } 17 16 15 14 13 12 11 10
18-Apr-2015 51 46 54 54 50 49 45 2015 2015 {'April'} {'April' } 18 17 16 15 14 13 12 11
19-Apr-2015 47 51 46 54 54 50 49 2015 2015 {'April'} {'April' } 19 18 17 16 15 14 13 12
20-Apr-2015 41 47 51 46 54 54 50 2015 2015 {'April'} {'April' } 20 19 18 17 16 15 14 13
21-Apr-2015 41 41 47 51 46 54 54 2015 2015 {'April'} {'April' } 21 20 19 18 17 16 15 14
22-Apr-2015 51 41 41 47 51 46 54 2015 2015 {'April'} {'April' } 22 21 20 19 18 17 16 15
23-Apr-2015 50 51 41 41 47 51 46 2015 2015 {'April'} {'April' } 23 22 21 20 19 18 17 16
24-Apr-2015 40 50 51 41 41 47 51 2015 2015 {'April'} {'April' } 24 23 22 21 20 19 18 17
25-Apr-2015 39 40 50 51 41 41 47 2015 2015 {'April'} {'April' } 25 24 23 22 21 20 19 18
⋮
prepTrainingTbl1
contains lagged predictors (with Lag
in their names) and leading predictors (with Step
in their names). The table contains missing values due to the creation of these prepared predictors. For example, TemperatureF_Lag1
contains a missing value at time 10-Apr-2015
because the temperature at time 09-Apr-2015
is not known.
Observe the prepared predictor data used to train Mdl.Learners{7}
.
prepTrainingTbl7 = preparedPredictors(Mdl,trainingTbl, ...
HorizonStep=7)
prepTrainingTbl7=372×19 timetable
Time TemperatureF_Lag1 TemperatureF_Lag2 TemperatureF_Lag3 TemperatureF_Lag4 TemperatureF_Lag5 TemperatureF_Lag6 TemperatureF_Lag7 Year_Step7 Year_Step6 Month_Step7 Month_Step6 Day_Step7 Day_Step6 Day_Step5 Day_Step4 Day_Step3 Day_Step2 Day_Step1 Day_Lag1
___________ _________________ _________________ _________________ _________________ _________________ _________________ _________________ __________ __________ ___________ ___________ _________ _________ _________ _________ _________ _________ _________ ________
10-Apr-2015 NaN NaN NaN NaN NaN NaN NaN 2015 NaN {'April'} {0x0 char} 10 NaN NaN NaN NaN NaN NaN NaN
11-Apr-2015 NaN NaN NaN NaN NaN NaN NaN 2015 2015 {'April'} {'April' } 11 10 NaN NaN NaN NaN NaN NaN
12-Apr-2015 NaN NaN NaN NaN NaN NaN NaN 2015 2015 {'April'} {'April' } 12 11 10 NaN NaN NaN NaN NaN
13-Apr-2015 NaN NaN NaN NaN NaN NaN NaN 2015 2015 {'April'} {'April' } 13 12 11 10 NaN NaN NaN NaN
14-Apr-2015 NaN NaN NaN NaN NaN NaN NaN 2015 2015 {'April'} {'April' } 14 13 12 11 10 NaN NaN NaN
15-Apr-2015 NaN NaN NaN NaN NaN NaN NaN 2015 2015 {'April'} {'April' } 15 14 13 12 11 10 NaN NaN
16-Apr-2015 NaN NaN NaN NaN NaN NaN NaN 2015 2015 {'April'} {'April' } 16 15 14 13 12 11 10 NaN
17-Apr-2015 41 NaN NaN NaN NaN NaN NaN 2015 2015 {'April'} {'April' } 17 16 15 14 13 12 11 10
18-Apr-2015 45 41 NaN NaN NaN NaN NaN 2015 2015 {'April'} {'April' } 18 17 16 15 14 13 12 11
19-Apr-2015 49 45 41 NaN NaN NaN NaN 2015 2015 {'April'} {'April' } 19 18 17 16 15 14 13 12
20-Apr-2015 50 49 45 41 NaN NaN NaN 2015 2015 {'April'} {'April' } 20 19 18 17 16 15 14 13
21-Apr-2015 54 50 49 45 41 NaN NaN 2015 2015 {'April'} {'April' } 21 20 19 18 17 16 15 14
22-Apr-2015 54 54 50 49 45 41 NaN 2015 2015 {'April'} {'April' } 22 21 20 19 18 17 16 15
23-Apr-2015 46 54 54 50 49 45 41 2015 2015 {'April'} {'April' } 23 22 21 20 19 18 17 16
24-Apr-2015 51 46 54 54 50 49 45 2015 2015 {'April'} {'April' } 24 23 22 21 20 19 18 17
25-Apr-2015 47 51 46 54 54 50 49 2015 2015 {'April'} {'April' } 25 24 23 22 21 20 19 18
⋮
Because Mdl.Learners{7}
predicts seven steps ahead, prepTrainingTbl7
contains different predictors from the predictors in prepTrainingTbl1
. For example, prepTrainingTbl7
contains the predictors Year_Step7
and Year_Step6
instead of the predictors Year_Step1
and Year_Lag1
in prepTrainingTbl1
. The step numbers indicate the horizon steps (that is, the number of time steps ahead).
Compute the test set mean squared error at each horizon step.
mse = loss(Mdl,testTbl)
mse = 1×7
32.1256 45.3297 49.8831 49.3660 55.7613 50.4300 53.6758
Obtain the prepared test set predictor data used by Mdl.Learners{1}
to compute mse(1)
. Compare the variables in prepTestTbl1
and prepTrainingTbl1
.
prepTestTbl1 = preparedPredictors(Mdl,testTbl);
isequal(prepTrainingTbl1.Properties.VariableNames, ...
prepTestTbl1.Properties.VariableNames)
ans = logical
1
The prepared predictors in prepTestTbl1
and prepTrainingTbl1
are the same.
Similarly, obtain the prepared test set predictor data used by Mdl.Learners{7}
to compute mse(7)
. Compare the variables in prepTestTbl7
and prepTrainingTbl7
.
prepTestTbl7 = preparedPredictors(Mdl,testTbl, ... HorizonStep=7); isequal(prepTrainingTbl7.Properties.VariableNames, ... prepTestTbl7.Properties.VariableNames)
ans = logical
1
The prepared predictors in prepTestTbl7
and prepTrainingTbl7
are also the same.
More About
Direct Forecasting
Direct forecasting is a forecasting technique that uses separate models to predict the response values at different future time steps (horizon steps). This technique differs from recursive forecasting, where one model is used to predict values at multiple horizon steps.
The software prepares the predictor data for each model and then uses the model to forecast at a particular horizon step.
For more information, see PreparedPredictorsPerHorizon
and Horizon
.
Forecasting Data
The directforecaster
function accepts data sets with regularly sampled values
that include a response variable and exogenous predictors (optional). That is, the time
steps between consecutive observations are the same. In this context, exogenous predictors
are predictors that are not derived from the response variable.
Consider the following data set.
In this example, the row times in MeasurementTime
show that the time difference between consecutive observations is one hour. The times 18-Dec-2015 14:00:00
and 18-Dec-2015 15:00:00
are future time steps that exist beyond the available data. They represent the first and second horizon steps. (See Horizon
.)
Suppose the Temp
variable is the response variable. The
Pressure
, WindSpeed
, and
WorkHours
variables are exogenous predictors. The
WorkHours
variable is a leading exogenous predictor because its
future values are known. (See LeadingPredictors
.)
Before fitting a forecasting model, the software creates time-shifted features from the response and exogenous predictors based on user-specified lag values. In this example, the red rectangles indicate a ResponseLags
value of 1
, PredictorLags
value of [1 2 3]
, and LeadingPredictorLags
value of [0 1]
at horizon step 1
(18-Dec-2015 14:00:00
).
Extended Capabilities
Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.
To run in parallel, set the UseParallel
name-value argument to
true
in the call to this function.
For more general information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).
Version History
Introduced in R2023b
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)