## Estimation of Multivariate Regression Models

### Least Squares Estimation

#### Ordinary Least Squares

When you fit multivariate linear regression models using `mvregress`, you can use the optional name-value pair `'algorithm','cwls'` to choose least squares estimation. In this case, by default, `mvregress` returns ordinary least squares (OLS) estimates using $\Sigma ={I}_{d}$. Alternatively, if you specify a covariance matrix for weighting, you can return covariance-weighted least squares (CWLS) estimates. If you combine OLS and CWLS, you can get feasible generalized least squares (FGLS) estimates.

The OLS estimate for the coefficient vector is the vector $b$ that minimizes

`$\sum _{i=1}^{n}{\left({y}_{i}-{X}_{i}b\right)}^{\prime }\left({y}_{i}-{X}_{i}b\right).$`

Let $y$ denote the nd-by-1 vector of stacked d-dimensional responses, and $X$ denote the nd-by-K matrix of stacked design matrices. The K-by-1 vector of OLS regression coefficient estimates is

`${b}_{OLS}={\left({X}^{\prime }\text{\hspace{0.17em}}X\right)}^{-1}{X}^{\prime }\text{\hspace{0.17em}}y.$`

This is the first `mvregress` output.

Given $\Sigma ={I}_{d}$ (the `mvregress` OLS default), the variance-covariance matrix of the OLS estimates is

`$V\left({b}_{OLS}\right)={\left({X}^{\prime }\text{\hspace{0.17em}}X\right)}^{-1}.$`

This is the fourth `mvregress` output. The standard errors of the OLS regression coefficients are the square root of the diagonal of this variance-covariance matrix.

If your data is not scaled such that $\Sigma ={\sigma }^{2}{I}_{d}$, then you can multiply the `mvregress` variance-covariance matrix by the mean squared error (MSE), an unbiased estimate of ${\sigma }^{2}$. To compute the MSE, return the n-by-d matrix of residuals, $E$ (the third `mvregress` output). Then,

`$\text{MSE}=\frac{\sum _{i=1}^{n}{e}_{i}{e}_{i}{}^{\prime }}{n-K},$`

where ${e}_{i}=\left({y}_{i}-{X}_{i}\beta {\right)}^{\prime }$ is the ith row of $E$.

#### Covariance-Weighted Least Squares

For most multivariate problems, an identity error covariance matrix is insufficient, and leads to inefficient or biased standard error estimates. You can specify a matrix for CWLS estimation using the optional name-value pair argument `covar0`, for example, an invertible d-by-d matrix named ${C}_{0}$. Usually, ${C}_{0}$ is a diagonal matrix such that the inverse matrix ${C}_{0}^{-1}$ contains weights for each dimension to model heteroscedasticity. However, ${C}_{0}$ can also be a nondiagonal matrix that models correlation.

Given ${C}_{0}$, the CWLS solution is the vector $b$ that minimizes

`$\sum _{i=1}^{n}{\left({y}_{i}-{X}_{i}b\right)}^{\prime }{C}_{0}\left({y}_{i}-{X}_{i}b\right).$`

In this case, the K-by-1 vector of CWLS regression coefficient estimates is

`${b}_{CWLS}={\left({X}^{\prime }\text{\hspace{0.17em}}{\left({I}_{n}\otimes {C}_{0}\right)}^{-1}X\right)}^{-1}{X}^{\prime }\text{\hspace{0.17em}}{\left({I}_{n}\otimes {C}_{0}\right)}^{-1}y.$`

This is the first `mvregress` output.

If $\Sigma ={C}_{0}$, this is the generalized least squares (GLS) solution. The corresponding variance-covariance matrix of the CWLS estimates is

`$V\left({b}_{CWLS}\right)={\left(X\text{'}{\left({I}_{n}\otimes {C}_{0}\right)}^{-1}X\right)}^{-1}.$`

This is the fourth `mvregress` output. The standard errors of the CWLS regression coefficients are the square root of the diagonal of this variance-covariance matrix.

If you only know the error covariance matrix up to a proportion, that is, $\Sigma ={\sigma }^{2}{C}_{0}$, you can multiply the `mvregress` variance-covariance matrix by the MSE, as described in Ordinary Least Squares.

#### Error Covariance Estimation

Regardless of which least squares method you use, the estimate for the error variance-covariance matrix is

`$\stackrel{^}{\Sigma }=\left(\begin{array}{cccc}{\stackrel{^}{\sigma }}_{1}^{2}& {\stackrel{^}{\sigma }}_{12}& \cdots & {\stackrel{^}{\sigma }}_{1d}\\ {\stackrel{^}{\sigma }}_{12}& {\stackrel{^}{\sigma }}_{2}^{2}& \cdots & {\stackrel{^}{\sigma }}_{2d}\\ ⋮& ⋮& \ddots & ⋮\\ {\stackrel{^}{\sigma }}_{1d}& {\stackrel{^}{\sigma }}_{2d}& \cdots & {\stackrel{^}{\sigma }}_{d}^{2}\end{array}\right)=\frac{{E}^{\prime }E}{n},$`

where $E$ is the n-by-d matrix of residuals. The ith row of $E$ is ${e}_{i}={\left({y}_{i}-{X}_{i}b\right)}^{\prime }.$

The error covariance estimate, $\stackrel{^}{\Sigma }$, is the second `mvregress` output, and the matrix of residuals, $E$, is the third output. If you specify the optional name-value pair `'covtype','diagonal'`, then `mvregress` returns $\stackrel{^}{\Sigma }$ with zeros in the off-diagonal entries,

`$\stackrel{^}{\Sigma }=\left(\begin{array}{ccc}{\stackrel{^}{\sigma }}_{1}^{2}& & 0\\ & \ddots & \\ 0& & {\stackrel{^}{\sigma }}_{d}^{2}\end{array}\right).$`

#### Feasible Generalized Least Squares

The generalized least squares estimate is the CWLS estimate with a known covariance matrix. That is, given $\Sigma$ is known, the GLS solution is

`${b}_{GLS}={\left({X}^{\prime }{\left({I}_{n}\otimes \Sigma \right)}^{-1}X\right)}^{-1}{X}^{\prime }{\left({I}_{n}\otimes \Sigma \right)}^{-1}y,$`

with variance-covariance matrix

`$V\left({b}_{GLS}\right)={\left({X}^{\prime }{\left({I}_{n}\otimes \Sigma \right)}^{-1}X\right)}^{-1}.$`

In most cases, the error covariance is unknown. The feasible generalized least squares (FGLS) estimate uses $\stackrel{^}{\Sigma }$ in place of $\Sigma$. You can obtain two-step FGLS estimates as follows:

1. Perform OLS regression, and return an estimate $\stackrel{^}{\Sigma }$.

2. Perform CWLS regression, using ${C}_{0}=\stackrel{^}{\Sigma }$.

You can also iterate between these two steps until convergence is reached.

For some data, the OLS estimate $\stackrel{^}{\Sigma }$ is positive semidefinite, and has no unique inverse. In this case, you cannot get the FGLS estimate using `mvregress`. As an alternative, you can use `lscov`, which uses a generalized inverse to return weighted least squares solutions for positive semidefinite covariance matrices.

#### Panel Corrected Standard Errors

An alternative to FGLS is to use OLS coefficient estimates (which are consistent) and make a standard error correction to improve efficiency. One such standard error adjustment—which does not require inversion of the covariance matrix—is panel corrected standard errors (PCSE) . The panel corrected variance-covariance matrix for OLS estimates is

`${V}_{pcse}\left({b}_{OLS}\right)={\left({X}^{\prime }\text{\hspace{0.17em}}X\right)}^{-1}{X}^{\prime }\left({I}_{n}\otimes \Sigma \right)X{\left({X}^{\prime }\text{\hspace{0.17em}}X\right)}^{-1}.$`

The PCSE are the square root of the diagonal of this variance-covariance matrix. Fixed Effects Panel Model with Concurrent Correlation illustrates PCSE computation.

### Maximum Likelihood Estimation

#### Maximum Likelihood Estimates

The default estimation algorithm used by `mvregress` is maximum likelihood estimation (MLE). The loglikelihood function for the multivariate linear regression model is

`$\begin{array}{l}\mathrm{log}L\left(\beta ,\Sigma |y,X\right)=\frac{1}{2}nd\mathrm{log}\left(2\pi \right)+\frac{1}{2}n\mathrm{log}\left(\mathrm{det}\left(\Sigma \right)\right)\\ \text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}+\frac{1}{2}\sum _{i=1}^{n}{\left({y}_{i}-{X}_{i}\beta \right)}^{\prime }{\Sigma }^{-1}\left({y}_{i}-{X}_{i}\beta \right).\\ \text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\end{array}$`

The MLEs for $\beta$ and $\Sigma$ are the values that maximize the loglikelihood objective function.

`mvregress` finds the MLEs using an iterative two-stage algorithm. At iteration m + 1, the estimates are

`${b}_{MLE}^{\left(m+1\right)}={\left({X}^{\prime }{\left({I}_{n}\otimes {\Sigma }^{\left(m\right)}\right)}^{-1}X\right)}^{-1}{X}^{\prime }{\left({I}_{n}\otimes {\Sigma }^{\left(m\right)}\right)}^{-1}y$`

and

`${\stackrel{^}{\Sigma }}^{\left(m+1\right)}=\frac{1}{n}\sum _{i=1}^{n}\left({y}_{i}-{X}_{i}{b}_{MLE}^{\left(m+1\right)}\right){\left({y}_{i}-{X}_{i}{b}_{MLE}^{\left(m+1\right)}\right)}^{\prime }.$`

The algorithm terminates when the changes in the coefficient estimates and loglikelihood objective function are less than a specified tolerance, or when the specified maximum number of iterations is reached. The optional name-value pair arguments for changing these convergence criteria are `tolbeta`, `tolobj`, and `maxiter`, respectively.

#### Standard Errors

The variance-covariance matrix of the MLEs is an optional `mvregress` output. By default, `mvregress` returns the variance-covariance matrix for only the regression coefficients, but you can also get the variance-covariance matrix of $\stackrel{^}{\Sigma }$ using the optional name-value pair `'vartype','full'`. In this case, `mvregress` returns the variance-covariance matrix for all K regression coefficients, and d or d(d + 1)/2 covariance terms (depending on whether the error covariance is diagonal or full).

By default, the variance-covariance matrix is the inverse of the observed Fisher information matrix (the `'hessian'` option). You can request the expected Fisher information matrix using the optional name-value pair `'vartype','fisher'`. Provided there is no missing response data, the observed and expected Fisher information matrices are the same. If response data is missing, the observed Fisher information accounts for the added uncertainty due to the missing values, whereas the expected Fisher information matrix does not.

The variance-covariance matrix for the regression coefficient MLEs is

`$V\left({b}_{MLE}\right)={\left({X}^{\prime }\text{\hspace{0.17em}}{\left({I}_{n}\otimes \stackrel{^}{\Sigma }\right)}^{-1}X\right)}^{-1},$`

evaluated at the MLE of the error covariance matrix. This is the fourth `mvregress` output. The standard errors of the MLEs are the square root of the diagonal of this variance-covariance matrix.

For $\stackrel{^}{\Sigma }$, let $\theta$ denote the vector of parameters in the estimated error variance-covariance matrix. For example, if d = 2, then:

• If the estimated covariance matrix is diagonal, then $\theta =\left({\stackrel{^}{\sigma }}_{1}^{2},{\stackrel{^}{\sigma }}_{2}^{2}\right)$.

• If the estimated covariance matrix is full, then $\theta =\left({\stackrel{^}{\sigma }}_{1}^{2},{\stackrel{^}{\sigma }}_{12},{\stackrel{^}{\sigma }}_{2}^{2}\right)$.

The Fisher information matrix for $\theta$, $I\left(\theta \right)$, has elements

`$I{\left(\theta \right)}_{u,v}=\frac{1}{2}tr\left({\stackrel{^}{\Sigma }}^{-1}\text{​}\frac{\partial \stackrel{^}{\Sigma }}{\partial {\theta }_{u}}{\stackrel{^}{\Sigma }}^{-1}\text{​}\frac{\partial \stackrel{^}{\Sigma }}{\partial {\theta }_{v}}\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}u,v=1,\dots ,{n}_{\theta },$`

where ${n}_{\theta }$ is the length of $\theta$ (either d or d(d + 1)/2). The resulting variance-covariance matrix is

`$V\left(\theta \right)=I{\left(\theta \right)}^{-1}.$`

When you request the full variance-covariance matrix, `mvregress` returns (as the fourth output) the block diagonal matrix

`$\left(\begin{array}{cc}V\left({b}_{MLE}\right)& 0\\ 0& V\left(\theta \right)\end{array}\right).$`

### Missing Response Data

#### Expectation/Conditional Maximization

If any response values are missing, indicated by `NaN`, `mvregress` uses an expectation/conditional maximization (ECM) algorithm for estimation (if enough data is available). In this case, the algorithm is iterative for both least squares and maximum likelihood estimation. During each iteration, `mvregress` imputes missing response values using their conditional expectation.

Consider organizing the data so that the joint distribution of the missing and observed responses, denoted $\stackrel{˜}{y}$ and $y$ respectively, can be written as

`$\left(\begin{array}{l}\stackrel{˜}{y}\\ y\end{array}\right)\sim MVN\left\{\left(\begin{array}{l}\stackrel{˜}{X}\beta \\ X\beta \end{array}\right),\left(\begin{array}{cc}{\Sigma }_{\stackrel{˜}{y}}& {\Sigma }_{\stackrel{˜}{y}}{}_{y}\\ {\Sigma }_{y\stackrel{˜}{y}}& {\Sigma }_{y}\end{array}\right)\right\}.$`

Using properties of the multivariate normal distribution, the conditional expectation of the missing responses given the observed responses is

`$\text{E}\left(\stackrel{˜}{y}|y\right)=\stackrel{˜}{X}\beta +{\Sigma }_{\stackrel{˜}{y}y}{\Sigma }_{y}^{-1}\left(y-X\beta \right).$`

Also, the variance-covariance matrix of the conditional distribution is

`$\text{COV}\left(\stackrel{˜}{y}|y\right)={\Sigma }_{\stackrel{˜}{y}}-{\Sigma }_{\stackrel{˜}{y}y}{\Sigma }_{y}^{-1}{\Sigma }_{y\stackrel{˜}{y}}.$`

At each iteration of the ECM algorithm, `mvregress` uses the parameter values from the previous iteration to:

• Update the regression coefficients using the combined vector of observed responses and conditional expectations of missing responses.

• Update the variance-covariance matrix, adjusting for missing responses using the variance-covariance matrix of the conditional distribution.

Finally, the residuals that `mvregress` returns for missing responses are the difference between the conditional expectation and the fitted value, both evaluated at the final parameter estimates.

If you prefer to ignore any observations that have missing response values, use the name-value pair `'algorithm','mvn'`. Note that `mvregress` always ignores observations that have missing predictor values.

#### Observed Information Matrix

By default, `mvregress` uses the observed Fisher information matrix (the `'hessian'` option) to compute the variance-covariance matrix of the regression parameters. This accounts for the additional uncertainty due to missing response values.

The observed information matrix includes contributions from only the observed responses. That is, the observed Fisher information matrix for the parameters in the error variance-covariance matrix has elements

`$I{\left(\theta \right)}_{u,v}=\frac{1}{2}\sum _{i=1}^{n}tr\left({\stackrel{^}{\Sigma }}_{{}^{i}\text{​}}^{-1}\frac{\partial {\stackrel{^}{\Sigma }}_{i}}{\partial {\theta }_{u}}{\stackrel{^}{\Sigma }}_{{}^{i}\text{​}}^{-1}\text{​}\frac{\partial {\stackrel{^}{\Sigma }}_{i}}{\partial {\theta }_{v}}\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}u,v=1,\dots ,{n}_{\theta },$`

where ${\stackrel{^}{\Sigma }}_{i}$ is the subset of $\stackrel{^}{\Sigma }$ corresponding to the observed responses in ${y}_{i}.$

For example, if d = 3, but ${y}_{i2}$ is missing, then

`${\stackrel{^}{\Sigma }}_{i}=\left(\begin{array}{cc}{\stackrel{^}{\sigma }}_{1}^{2}& {\stackrel{^}{\sigma }}_{13}\\ {\stackrel{^}{\sigma }}_{13}& {\stackrel{^}{\sigma }}_{3}^{2}\end{array}\right).$`

The observed Fisher information for the regression coefficients has similar contributions from the design and covariance matrices.

 Beck, N. and J. N. Katz. "What to Do (and Not to Do) with Time-Series-Cross-Section Data in Comparative Politics." American Political Science Review, Vol. 89, No. 3, pp. 634–647, 1995.