Main Content

Specify Regression Model with SARIMA Errors

This example shows how to specify a regression model with multiplicative seasonal ARIMA errors.

Load the Airline data set from the MATLAB® root folder, and load the recession data set. Plot the monthly passenger totals and log-totals.

load Data_Airline.mat
load Data_Recessions

y = DataTimeTable.PSSG; 
logY = log(y); 

figure
tiledlayout(2,1)
nexttile
plot(DataTimeTable.Time,y)
title('{\bf Monthly Passenger Totals (Jan1949 - Dec1960)}')
nexttile
plot(DataTimeTable.Time,log(y))
title('{\bf Monthly Passenger Log-Totals (Jan1949 - Dec1960)}')

Figure contains 2 axes objects. Axes object 1 with title blank Monthly blank Passenger blank Totals blank (Jan 1949 blank - blank Dec 1960 ) contains an object of type line. Axes object 2 with title blank Monthly blank Passenger blank Log-Totals blank (Jan 1949 blank - blank Dec 1960 ) contains an object of type line.

The log transformation seems to linearize the time series.

Construct this predictor, which is whether the country was in a recession during the sampled period. 0 means the country was not in a recession, and 1 means that it was in a recession.

X = zeros(numel(dates),1); % Preallocation
for j = 1:size(Recessions,1)
    X(dates >= Recessions(j,1) & dates <= Recessions(j,2)) = 1;
end

Fit the simple linear regression model,

yt=c+Xtβ+ut

to the data.

EstMdl = fitlm(X,logY);

Fit is a LinearModel that contains the least squares estimates.

Perform a residual diagnosis by plotting the residuals several ways.

figure
tiledlayout(2,2)
nexttile
plotResiduals(EstMdl,'caseorder','ResidualType','Standardized',...
    'LineStyle','-','MarkerSize',0.5)
nexttile
plotResiduals(EstMdl,'lagged','ResidualType','Standardized')
nexttile
plotResiduals(EstMdl,'probability','ResidualType','Standardized')
nexttile
plotResiduals(EstMdl,'histogram','ResidualType','Standardized') 

Figure contains 4 axes objects. Axes object 1 with title Case order plot of residuals, xlabel Row number, ylabel Residuals contains 2 objects of type line. Axes object 2 with title Plot of residuals vs. lagged residuals, xlabel Residual(t-1), ylabel Residual(t) contains 3 objects of type line. One or more of the lines displays its values using only markers Axes object 3 with title Normal probability plot of residuals, xlabel Residuals, ylabel Probability contains 2 objects of type functionline, line. One or more of the lines displays its values using only markers Axes object 4 with title Histogram of residuals, xlabel Residuals, ylabel Probability density contains an object of type patch.

r = EstMdl.Residuals.Standardized;

figure
tiledlayout(2,1)
nexttile
autocorr(r)
nexttile
parcorr(r)

Figure contains 2 axes objects. Axes object 1 with title Sample Autocorrelation Function, xlabel Lag, ylabel Sample Autocorrelation contains 4 objects of type stem, line, constantline. These objects represent ACF, Confidence Bound. Axes object 2 with title Sample Partial Autocorrelation Function, xlabel Lag, ylabel Sample Partial Autocorrelation contains 4 objects of type stem, line, constantline. These objects represent PACF, Confidence Bound.

The residual plots indicate that the residuals are autocorrelated. The probability plot and histogram seem to indicate that the residuals are Gaussian.

The ACF of the residuals confirms that they are autocorrelated.

Take the 1st difference of the residuals and plot the ACF and PACF of the differenced residuals.

dR = diff(r);

figure
tiledlayout(2,1)
nexttile
autocorr(dR,'NumLags',50)
nexttile
parcorr(dR,'NumLags',50)

Figure contains 2 axes objects. Axes object 1 with title Sample Autocorrelation Function, xlabel Lag, ylabel Sample Autocorrelation contains 4 objects of type stem, line, constantline. These objects represent ACF, Confidence Bound. Axes object 2 with title Sample Partial Autocorrelation Function, xlabel Lag, ylabel Sample Partial Autocorrelation contains 4 objects of type stem, line, constantline. These objects represent PACF, Confidence Bound.

The ACF shows that there are significantly large autocorrelations, particularly at every 12th lag. This indicates that the residuals have 12th degree seasonal integration.

Take the first and 12th differences of the residuals. Plot the differenced residuals, and their ACF and PACF.

DiffPoly = LagOp([1 -1]);
SDiffPoly = LagOp([1 -1],'Lags',[0, 12]);
diffR = filter(DiffPoly*SDiffPoly,r);

figure
tiledlayout("flow")
nexttile([1 2])
plot(diffR)
axis tight
nexttile
autocorr(diffR)
h = gca;
h.FontSize = 7; 
nexttile
parcorr(diffR)
h = gca;
h.FontSize = 7; 

Figure contains 3 axes objects. Axes object 1 contains an object of type line. Axes object 2 with title Sample Autocorrelation Function, xlabel Lag, ylabel Sample Autocorrelation contains 4 objects of type stem, line, constantline. These objects represent ACF, Confidence Bound. Axes object 3 with title Sample Partial Autocorrelation Function, xlabel Lag, ylabel Sample Partial Autocorrelation contains 4 objects of type stem, line, constantline. These objects represent PACF, Confidence Bound.

The residuals resemble white noise (with possible heteroscedasticity). According to Box and Jenkins (1994), Chapter 9, the ACF and PACF indicate that the unconditional disturbances are an IMA(0,1,1)×(0,1,1)12 model.

Specify the regression model with IMA(0,1,1)×(0,1,1)12 errors:

yt=Xtβ+ut(1-L)(1-L12)ut=(1+b1L)(1+B12L12)εt.

Mdl = regARIMA('MALags',1,'D',1,'Seasonality',12,'SMALags',12) 
Mdl = 
  regARIMA with properties:

     Description: "ARIMA(0,1,1) Error Model Seasonally Integrated with Seasonal MA(12) (Gaussian Distribution)"
      SeriesName: "Y"
    Distribution: Name = "Gaussian"
       Intercept: NaN
            Beta: [1×0]
               P: 13
               D: 1
               Q: 13
              AR: {}
             SAR: {}
              MA: {NaN} at lag [1]
             SMA: {NaN} at lag [12]
     Seasonality: 12
        Variance: NaN

Mdl is a regression model with IMA(0,1,1)×(0,1,1)12 errors. By default, the innovations are Gaussian, and all parameters are NaN. The property:

  • P = p + D + sps + s = 0 + 1 + 0 + 12 = 13.

  • Q = q + sqs = 1 + 12 = 13.

Therefore, the software requires at least 13 presample observations to initialize the model.

Pass Mdl, y, and X into estimate to estimate the model. In order to simulate or forecast responses using simulate or forecast, you need to set values to all parameters.

References:

Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.

See Also

| | | |

Related Examples

More About