Implement Bayesian Linear Regression
Econometrics Toolbox™ includes a self-contained framework that allows you to implement Bayesian linear regression. The framework contains two groups of prior models for the regression coefficients β and the disturbance variance σ2:
Standard Bayesian linear regression prior models — The five prior model objects in this group range from the simple conjugate normal-inverse-gamma prior model through flexible prior models specified by draws from the prior distributions or a custom function. Although standard prior models can serve several purposes, they are best suited for posterior estimation, simulation (from the joint or, for most models, conditional posterior), and forecasting from the posterior predictive distribution.
Bayesian prior models for predictor variable selection — The models in this group can perform Bayesian lasso regression or stochastic search variable selection (SSVS). They are best suited for posterior estimation, during which the predictor selection algorithm occurs. The resulting posterior model is represented by draws from the Gibbs sampler (an empirical model object), and the estimation summary contains the results from the predictor selection algorithm. The estimation procedure does not remove insignificant or redundant variables for you, but the coefficients of a well-tuned model are close to zero. Therefore, as with standard models, you can simulate draws from the posterior distribution or forecast from the posterior predictive distribution without having to remove any variables.
The general workflow for estimating features of the posterior distributions, and then forecasting responses given new predictor data, is:
Use
bayeslm
to create a prior model object that represents your assumptions about the distribution or a prior model object that is suited for predictor selection.Pass the prior model object and the predictor and response data to the
estimate
function. By default,estimate
returns a model object that represents the posterior distribution.Pass the model object representing the posterior distribution to
forecast
.
For details on the standard Bayesian linear regression workflow, see Workflow for Standard Bayesian Linear Regression Models, and for Bayesian predictor selection, see Workflow for Bayesian Predictor Selection.
Workflow for Standard Bayesian Linear Regression Models
The following procedure describes the workflow for estimating features of the posterior distributions, and then forecasting responses given predictor data.
Choose a joint prior distribution for (β,σ2). Then, using
bayeslm
, create the Bayesian linear regression model object that completely specifies your beliefs about the joint prior distribution. This table contains the available prior model objects.Model Object Joint Prior Distribution of (β,σ2) When to Create conjugateblm
π(β|σ2) is Gaussian with mean
Mu
and covariance σ2V
.π(σ2) is inverse gamma with shape
A
and scaleB
.
Create this model object when all of the following are true:
You are fairly confident that the parameters have the corresponding joint prior, and that β depends on σ2.
You want to incorporate your prior knowledge of the prior mean and covariance of β and the shape and scale of σ2.
You want analytical forms for the marginal and conditional posteriors, which are normal-inverse-gamma conjugate distributions for both types.
semiconjugateblm
π(β) is Gaussian with mean
Mu
and covarianceV
.π(σ2) is inverse gamma with shape
A
and scaleB
.β and σ2 are independent.
Create this model object when all of the following are true:
You are fairly confident that the parameters have the corresponding joint prior, and that β and σ2 are independent.
You want to incorporate your prior knowledge of the prior mean and covariance of β and the shape and scale of σ2.
You want analytical forms for the conditional posteriors, which are normal-inverse-gamma conjugate conditional distributions.
diffuseblm
Create this model object when all of the following are true:
You want the posterior to be influenced more by the information in the data than by the prior.
The joint prior distribution is inversely proportional to σ2 (Jeffreys noninformative prior [2]).
You want analytical forms for the marginal and conditional posteriors, which are the normal-inverse-gamma conjugate distributions for both types.
customblm
A function handle of a custom function that computes the log of the joint prior distribution. Create this model object when you want to specify the log of the joint prior distribution. This specification allows for maximum flexibility. Given the data, estimate features of the posterior distributions. The functions to use in this step depend on your analysis goals.
Function Goals estimate
Obtain a posterior model object for forecasting. Posterior model objects include:
Estimates of the mean and covariance matrix of the marginal posterior π(β|y,x) and the mean and variance of π(σ2|y,x).
Marginal posteriors and their parameter values. Analytical solutions are available for
conjugateblm
anddiffuseblm
prior models. For all other prior models,estimate
must use Monte Carlo sampling.95% equitailed, credible intervals. For nonanalytical posteriors, 95% equitailed, credible intervals are the 0.025-quantile and the 0.975-quantile of the retained Monte Carlo sample.
Estimate the mean and covariance of the conditional distribution π(β|σ2,y,x), that is, implement linear regression with σ2 held fixed.
Update an existing posterior distribution based on new data.
simulate
Approximate the expected value of a function of the parameters with respect to the joint posterior π(β,σ2|y,x). That is, draw multiple samples of (β,σ2) from their joint posterior, apply a function to each draw, and then compute the average of the transformed draws.
Draw from the conditional posterior distributions π(β|σ2,y,x) and π(σ2|β,y,x). This selection is convenient for running a Markov chain Monte Carlo (MCMC) sampler, such as a Gibbs sampler.
If you have a custom prior model (
customblm
object), then choose a Markov chain Monte Carlo (MCMC) sampler when you callestimate
orsimulate
. This table contains a list of supported MCMC samplers. After choosing a sampler, try the default tuning parameter values first.MCMC Sampler Specify Using Description Hamiltonian Monte Carlo (HMC) 'Sampler',"hmc"
Because the HMC sampler tunes itself, and resulting samples mix well and converge to their stationary distribution more quickly, try this sampler first.
To increase sampling speed, supply the gradient of the log PDF for all or some of the parameters.
Random walk Metropolis 'Sampler',"metropolis"
If the sample size is reasonably large and the prior does not dominate the likelihood, then try this sampler.
Supported proposal distributions are multivariate normal and multivariate t distributions.
Tuning parameters include the distribution, its scale matrix, and its degrees of freedom.
Slice 'Sampler',"slice"
(default)To achieve adequate mixing and convergence, carefully tune the typical sampling-interval width. Values are application dependent. After estimating a nonanalytical posterior by using an MCMC sampler, inspect the posterior or conditional posterior draws for adequate mixing. For more details, see Posterior Estimation and Simulation Diagnostics.
If the quality of the samples is not satisfactory, then create a sampler options structure by using
sampleroptions
, which allows you to specify the tuning parameter values that are appropriate for the sampler. For example, to specify a random walk Metropolis sampler that uses a multivariate t proposal distribution with 5 degrees of freedom, enter:After you create the sampler options structure, specify it when you calloptions = sampleroptions('Sampler',"metropolis",'Distribution',"mvt",... 'DegreeOfFreedom',5)
estimate
orsimulate
by using the'Options'
name-value pair argument.Forecast responses given the new predictor data using
forecast
. Theforecast
function constructs forecasts from the posterior predictive distribution. Analytical posterior predictive distributions are available forconjugateblm
anddiffuseblm
prior models. For all other prior models,forecast
resorts to Monte Carlo sampling. As with estimation and simulation, you can choose an MCMC sampler forcustomblm
models. Ifforecast
uses an MCMC sampler, you should inspect the posterior or conditional posterior draws for adequate mixing.
Workflow for Bayesian Predictor Selection
The following procedure describes the workflow for performing Bayesian predictor selection for linear regression models, and then forecasting responses given predictor data.
Econometrics Toolbox supports two Bayesian predictor selection algorithms: Bayesian lasso regression and SSVS. Choose a predictor selection algorithm, which implies a joint prior distribution for (β,σ2). Then, using
bayeslm
, create the Bayesian linear regression prior model object that performs the selected predictor selection algorithm, and optionally specify the tuning parameter value. This table contains the available prior model objects for predictor selection. For details on the forms of the prior distributions, see Posterior Estimation and Inference.Model Object Predictor Selection Algorithm Tuning Parameter When to Create mixconjugateblm
SSVS [1] Gaussian mixture variances, specified by the
'V'
name-value pair argument. Specify a two-column positive matrix.The first column contains the variances for the variable-inclusion component.
Specify relatively large values.
Values attribute higher probability to coefficients further away from 0.
The second column contains the variances for the variable-exclusion component.
Specify relatively small values.
Values attribute higher probability to coefficients close to 0.
The prior variance of β is a function of σ2.
You want posterior estimates of the probability that a variable is included in the model.
mixsemiconjugateblm
SSVS Same as mixconjugateblm
β and σ2 are independent, a priori.
You want posterior estimates of the probability that a variable is included in the model.
lassoblm
Bayesian lasso regression [3] λ, specified by the
'Lambda'
name-value pair argument. You can supply a positive scalar or vector.Larger values indicate more regularization, which implies that the prior of β is dense around zero.
If predictor variables differ in scale by orders of magnitude, then supply a vector of shrinkages, where elements correspond to coefficients.
You want to force insignificant and redundant predictor variables to have coefficients with posterior mode of zero and fairly tight 95% posterior credible intervals around that mode.
Because predictor selection algorithms can be sensitive to differing scales of the predictor variables (Bayesian lasso regression, in particular), determine the scale of the predictors by passing the data to
boxplot
, or by estimating their means and standard deviations by usingmean
andstd
, respectively.For SSVS prior models, plot the prior distributions of the coefficients by using
plot
. The plots give you an idea about how the densities of the two Gaussian components balance. You can adjust the variances of a coefficient by using dot notation. For example, to attribute prior variances of 1000 and 0.1 to the variable-inclusion and variable-exclusion components, respectively, for coefficient
, enter:j
For more details on specifyingPriorMdl.V(j,:) = [1000 0.1];
V
, see [1].For lasso prior models, determine a regularization path, that is, a series of values for λ to iterate through during posterior estimation. Values are data dependent.
If the predictor variables have similar scales, specify a left boundary that forces most variables into the model (that is, attribute almost no penalty), specify a right boundary that forces almost all coefficients to 0, and specify enough values in between the boundaries to search the space adequately. By default, the software attributes a shrinkage of
0.01
to the intercept and1
to all coefficients.If the scales of the predictor variables differ by orders of magnitude, then you can rescale or standardize the data. However, these actions make interpreting the coefficients difficult. Instead, specify a regularization path and use it to create a matrix of repeated rows. For those variables whose scales are small or large, multiply the corresponding rows by the appropriate order of magnitude to magnify the penalty for small coefficients or reduce the penalty for large coefficients.
For example, suppose a regression model has three predictors. The first two predictors have similar scales, but the third predictor has a scale that is 3 orders larger. Suppose the 100-element regularization path is in the 1-by-100 vector
Lambda
. To create a matrix of shrinkage values, enter the following code:p = 3 % Number of predictors numcoeffs = p + 1; % Number of coefficients, intercept and predictors LambdaMat = repmat(Lambda,4,1); LambdaMat(4,:) = 1e3*LambdaMat(4,:);
For more on specifying λ, see [3].
In a
for
loop, pass the predictor and response data toestimate
to estimate features of the posterior distributions for each value of the tuning parameter. For each iteration, store the posterior model object and estimation summary table in a cell array.estimate
uses the Gibbs sampler to sample successively from the full conditionals (see Analytically Tractable Posteriors). You can change aspects of the Gibbs sampler, such as the thinning factor, using the available options.Posterior models (
empiricalblm
model objects) store the draws from the full conditionals, among other things.Estimation summary tables are MATLAB® tables that include:
Estimates of the mean (
Mean
) and covariance matrix (Covariances
) of the marginal posterior π(β|y,x), and the mean and variance of π(σ2|y,x).95% equitailed, credible intervals (
CI95
), which are the 0.025-quantile and the 0.975-quantile of the retained Monte Carlo sample.Posterior variable-inclusion probabilities (
Regime
) for SSVS prior models only. Values below a threshold that you determine indicate that the corresponding predictor is insignificant or redundant.
Although you estimate multiple posterior models, a good practice is to inspect the posterior draws for adequate mixing, particularly for models estimated using the boundaries of the tuning parameter values. For more details, see Posterior Estimation and Simulation Diagnostics.
Determine the best-performing posterior model. Two examples are:
Choose the simplest model that minimizes the mean squared error (MSE, the means of
Sigma2
) among all posterior models. This method is simple, but resulting models might not generalize well.Choose the simplest model that minimizes forecast MSE.
Partition the data into estimation and forecast samples.
For all selected tuning parameter values, estimate the posterior distribution by using the estimation sample data.
Compute the forecast MSE using the forecast sample and responses forecasted by using the posterior predictive distribution.
Forecast responses given new predictor data using
forecast
. Theforecast
function constructs forecasts from the posterior predictive distribution by implementing a Gibbs sampler. Inspect the draws for adequate mixing.
References
[1] George, E. I., and R. E. McCulloch. "Variable Selection Via Gibbs Sampling." Journal of the American Statistical Association. Vol. 88, No. 423, 1993, pp. 881–889.
[2] Marin, J. M., and C. P. Robert. Bayesian Core: A Practical Approach to Computational Bayesian Statistics. New York: Springer Science+Business Media, LLC, 2007.
[3] Park, T., and G. Casella. "The Bayesian Lasso." Journal of the American Statistical Association. Vol. 103, No. 482, 2008, pp. 681–686.
See Also
Functions
Objects
conjugateblm
|semiconjugateblm
|diffuseblm
|customblm
|empiricalblm
|mixconjugateblm
|mixsemiconjugateblm
|lassoblm