aicbic

Information criteria

Description

To assess model adequacy, aicbic computes information criteria given loglikelihood values obtained by fitting competing models to data.

example

aic = aicbic(logL,numParam) returns the Akaike information criteria (AIC) given loglikelihood values logL derived from fitting different models to data, and given the corresponding number of estimated model parameters numParam.

example

[aic,bic] = aicbic(logL,numParam,numObs) also returns the Bayesian (Schwarz) information criteria (BIC) given corresponding sample sizes used in estimation numObs.

example

[aic,bic] = aicbic(logL,numParam,numObs,'Normalize',true) normalizes results by dividing all output arguments by the sample sizes numObs. By default, aicbic does not normalize results ('Normalize',false).

example

[aic,bic,ic] = aicbic(logL,numParam,numObs) also returns the structure ic containing the AIC, BIC, and other information criteria.

[aic,bic,ic] = aicbic(logL,numParam,numObs,'Normalize',true) normalizes all returned information criteria by the sample sizes numObs.

Examples

collapse all

Compare the in-sample fits of three competing models using the AIC and BIC. Their loglikelihood values logL and corresponding number of estimated parameters numParam are in the following table. Suppose the effective sample size is 1500.

logL = [-681.4724; -663.4615; -632.3158];
numParam = [12; 18; 27];
numObs = 1500;
Tbl = table(logL,numParam,'RowNames',"Model"+string(1:3))
Tbl=3×2 table
logL      numParam
_______    ________

Model1    -681.47       12
Model2    -663.46       18
Model3    -632.32       27

Compute AIC

Calculate the AIC of each estimated model.

aic = aicbic(logL,numParam)
aic = 3×1
103 ×

1.3869
1.3629
1.3186

The model with the lowest AIC has the best in-sample fit. Identify the model with the lowest AIC.

[~,idxmin] = min(aic);
bestFitAIC = Tbl.Properties.RowNames{idxmin}
bestFitAIC =
'Model3'

The AIC suggests that Model3 has the best, most parsimonious fit, despite being the most complex of the three models.

Compute BIC

Calculate the BIC of each estimated model. Specify the sample size numObs, which is required for computing the BIC.

[~,bic] = aicbic(logL,numParam,numObs)
bic = 3×1
103 ×

1.4507
1.4586
1.4621

As is the case with the AIC, the model with the lowest BIC has the best in-sample fit. Identify the model with the lowest BIC.

[~,idxmin] = min(bic);
bestFitBIC = Tbl.Properties.RowNames{idxmin}
bestFitBIC =
'Model1'

The BIC suggests Model1, the simplest of the three models. The results show that when the sample size is large, the BIC imposes a greater penalty on complex models than the AIC.

Fit several models to simulated data, and then compare the model fits using all available information criteria.

Simulate a random path of length 100 from the data generating process (DGP)

${y}_{t}=1+0.2{y}_{t-1}-0.4{y}_{t-2}+{\epsilon }_{t},$

where ${\epsilon }_{\mathit{t}}$ is a random Gaussian series with mean 0 and variance 1.

rng(1)  % For reproducibility
T = 100;
DGP = arima('Constant',1,'AR',[0.2,-0.4],'Variance',1);
y = simulate(DGP,T);

Assume that the DGP is unknown, and that the AR(1), AR(2), and AR(3) models are appropriate for describing the DGP.

For each competing model, create an arima model template for estimation.

Mdl(1) = arima(1,0,0);
Mdl(2) = arima(2,0,0);
Mdl(3) = arima(3,0,0);

Fit each model to the simulated data y, compute the loglikelihood, and suppress the estimation display.

numMdl = numel(Mdl);
logL = zeros(numMdl,1);      % Preallocate
numParam = zeros(numMdl,1);

for j = 1:numMdl
[EstMdl,~,logL(j)] = estimate(Mdl(j),y,'Display','off');
results = summarize(EstMdl);
numParam(j) = results.NumEstimatedParameters;
end

For each model, compute all available information criteria.

[~,~,ic] = aicbic(logL,numParam,T)
ic = struct with fields:
aic: [310.9968 285.5082 287.0309]
bic: [318.8123 295.9289 300.0567]
aicc: [311.2468 285.9292 287.6692]
caic: [321.8123 299.9289 305.0567]
hqc: [314.1599 289.7256 292.3027]

ic is a 1-D structure array with a field for each information criterion. Each field contains a vector of measurements; element j corresponds to the model yielding loglikelihood logL(j).

For each criterion, determine the model that yields the minimum value.

[~,minIdx] = structfun(@min,ic);
[Mdl(minIdx).Description]'
ans = 5x1 string
"ARIMA(2,0,0) Model (Gaussian Distribution)"
"ARIMA(2,0,0) Model (Gaussian Distribution)"
"ARIMA(2,0,0) Model (Gaussian Distribution)"
"ARIMA(2,0,0) Model (Gaussian Distribution)"
"ARIMA(2,0,0) Model (Gaussian Distribution)"

The minimum of each criterion corresponds to the AR(2) model, which has the structure of the DGP.

Fit several models to simulated data, specify a presample for estimation, and then compare the model fits using normalized AIC.

Simulate a random path of length 50 from the DGP

${y}_{t}=1+0.2{y}_{t-1}-0.4{y}_{t-2}+{\epsilon }_{t},$

where ${\epsilon }_{\mathit{t}}$ is a random Gaussian series with mean 0 and variance 1.

rng(1)  % For reproducibility
T = 50;
DGP = arima('Constant',1,'AR',[0.2,-0.4],'Variance',1);
y = simulate(DGP,T);

Create an arima model template for each competing model.

Mdl(1) = arima(1,0,0);
Mdl(2) = arima(2,0,0);
Mdl(3) = arima(3,0,0);

Fit each model to the simulated data y, and specify the required number of presample observations for each fit. Compute the loglikelihood, and suppress the estimation display.

numMdl = numel(Mdl);
logL = zeros(numMdl,1);      % Preallocate
numParam = zeros(numMdl,1);
numObs = zeros(numMdl,1);

for j = 1:numMdl
y0 = y(1:Mdl(j).P);             % Presample
yest = y((Mdl(j).P+1):end);     % Estimation sample
[EstMdl,~,logL(j)] = estimate(Mdl(j),yest,'Y0',y0,...
'Display','off');
results = summarize(EstMdl);
numParam(j) = results.NumEstimatedParameters;
numObs(j) = results.SampleSize;
end

For each model, compute the normalized AIC.

aic = aicbic(logL,numParam,numObs,'Normalize',true)
aic = 3×1

3.2972
2.9880
3.0361

Determine the model that yields the minimum AIC.

[~,minIdx] = min(aic);
Mdl(minIdx).Description
ans =
"ARIMA(2,0,0) Model (Gaussian Distribution)"

Input Arguments

collapse all

Loglikelihoods associated with parameter estimates of different models, specified as a numeric vector.

Data Types: double

Number of estimated parameters in the models, specified as a positive integer applied to all elements of logL, or a vector of positive integers with the same length as logL.

Data Types: double

Sample sizes used in estimation, specified as a positive integer applied to all elements of logL, or a vector of positive integers with the same length as logL.

aicbic requires numObs for all criteria except the AIC. aicbic also requires numObs if 'Normalize' is true.

Data Types: double

Output Arguments

collapse all

AIC corresponding to elements of logL, returned as a numeric vector.

BIC corresponding to elements of logL, returned as a numeric vector.

Information criteria, returned as a 1-D structure array containing the fields described in this table. Field values are numeric vectors with elements corresponding to elements of logL.

FieldDescription
aicAIC
bicBIC
aiccCorrected AIC (AICc)
caicConsistent AIC (CAIC)
hqcHannan-Quinn criteria (HQC)

ic.aic and ic.bic are the same values returned in aic and bic, respectively.

collapse all

Information Criteria

Information criteria rank models using measures that balance goodness of fit with parameter parsimony. For a particular criterion, models with lower values are preferred.

This table describes how aicbic computes unnormalized criteria.

Information CriterionFormula
AICaic = -2*logL + 2*numParam
BICbic = -2*logL + log(numObs)*numParam
AICcaicc = aic + [2*numParam*(numParam + 1)]/(numObs – numParam – 1)
CAICcaic = -2*logL + (log(numObs) + 1)*numParam
HQChqc = -2*logL + 2*log(log(numObs))*numParam

Misspecification tests, such as the Lagrange multiplier (lmtest), likelihood ratio (lratiotest), and Wald (waldtest) tests, compare the loglikelihoods of two competing nested models. By contrast, information criteria based on loglikelihoods of individual model fits are approximate measures of information loss with respect to the DGP. Information criteria provide relative rankings of any number of competing models, including nonnested models.

Tips

• In small samples, AIC tends to overfit. To address overfitting, AICc adds a size-dependent correction term that increases the penalty on the number of parameters. AICc approaches AIC asymptotically. The analysis in  suggests using AICc when numObs/numParam < 40.

• When econometricians compare models with different numbers of autoregressive lags or different orders of differencing, they often scale information criteria by the number of observations . To scale information criteria, set numObs to the effective sample size of each estimate, and set 'Normalize' to true.

 Akaike, Hirotugu. "Information Theory and an Extension of the Maximum Likelihood Principle.” In Selected Papers of Hirotugu Akaike, edited by Emanuel Parzen, Kunio Tanabe, and Genshiro Kitagawa, 199–213. New York: Springer, 1998. https://doi.org/10.1007/978-1-4612-1694-0_15.

 Akaike, Hirotugu. “A New Look at the Statistical Model Identification.” IEEE Transactions on Automatic Control 19, no. 6 (December 1974): 716–23. https://doi.org/10.1109/TAC.1974.1100705.

 Burnham, Kenneth P., and David R. Anderson. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. 2nd ed, New York: Springer, 2002.

 Hannan, Edward J., and Barry G. Quinn. “The Determination of the Order of an Autoregression.” Journal of the Royal Statistical Society: Series B (Methodological) 41, no. 2 (January 1979): 190–95. https://doi.org/10.1111/j.2517-6161.1979.tb01072.x.

 Lütkepohl, Helmut, and Markus Krätzig, editors. Applied Time Series Econometrics. 1st ed. Cambridge University Press, 2004. https://doi.org/10.1017/CBO9780511606885.

 Schwarz, Gideon. “Estimating the Dimension of a Model.” The Annals of Statistics 6, no. 2 (March 1978): 461–64. https://doi.org/10.1214/aos/1176344136.