Generalized Linear Mixed-Effects Models
What Are Generalized Linear Mixed-Effects Models?
Generalized linear mixed-effects (GLME) models describe the relationship between a response variable and independent variables using coefficients that can vary with respect to one or more grouping variables, for data with a response variable distribution other than normal. You can think of GLME models as extensions of generalized linear models (GLM) for data that are collected and summarized in groups. Alternatively, you can think of GLME models as a generalization of linear mixed-effects models (LME) for data where the response variable is not normally distributed.
A mixed-effects model consists of fixed-effects and random-effects terms. Fixed-effects terms are usually the conventional linear regression part of the model. Random-effects terms are associated with individual experimental units drawn at random from a population, and account for variations between groups that might affect the response. The random effects have prior distributions, whereas the fixed effects do not.
GLME Model Equations
The standard form of a generalized linear mixed-effects model is
where
y is an n-by-1 response vector, and yi is its ith element.
b is the random-effects vector.
Distr is a specified conditional distribution of y given b.
μ is the conditional mean of y given b, and μi is its ith element.
σ2 is the dispersion parameter.
w is the effective observation weight vector, and wi is the weight for observation i.
For a binomial distribution, the effective observation weight is equal to the prior weight specified using the
'Weights'
name-value pair argument infitglme
, multiplied by the binomial size specified using the'BinomialSize'
name-value pair argument.For all other distributions, the effective observation weight is equal to the prior weight specified using the
'Weights'
name-value pair argument infitglme
.
g(μ) is a link function that defines the relationship between the mean response μ and the linear combination of the predictors.
X is an n-by-p fixed-effects design matrix.
β is a p-by-1 fixed-effects vector.
Z is an n-by-q random-effects design matrix.
b is a q-by-1 random-effects vector.
δ is a model offset vector.
The model for the mean response μ is
where g-1 is inverse of the link function g(μ), and is the linear predictor of the fixed and random effects of the generalized linear mixed-effects model
A GLME model is parameterized by β, θ, and σ2.
The assumptions for generalized linear mixed-effects models are:
The random effects vector b has the prior distribution:
where σ2 is the dispersion parameter, and D is a symmetric and positive semidefinite matrix parameterized by an unconstrained parameter vector θ.
The observations yi are conditionally independent given b.
Prepare Data for Model Fitting
To fit a GLME model to your data, use fitglme
.
Format your input data using the table
data type.
Each row of the table represents one observation, and each column
represents one predictor variable. For more information on creating
and using table
, see Create Tables and Assign Data to Them.
Input data can include continuous and grouping variables. fitglme
assumes
that predictors using the following data types are categorical:
Logical
Categorical
Character vector or character array
String array
Cell array of character vectors
If the input data table contains any NaN
values,
then fitglme
excludes that entire row of data
from the fit. To exclude additional rows of data, you can use the 'Exclude'
name-value
pair argument of fitglme
when fitting the model.
Choose a Distribution Type for the Model
GLME models are used when the response data does not follow
a normal distribution. Therefore, when fitting a model using fitglme
, you must specify the response
distribution type using the 'Distribution'
name-value
pair argument. Often, the type of response data suggests the appropriate
distribution type for the model.
Type of Response Data | Suggested Response Distribution Type |
---|---|
Any real number | 'Normal' |
Any positive number | 'Gamma' or 'InverseGaussian' |
Any nonnegative integer | 'Poisson' |
Integer from 0 to n, where n is a fixed positive value | 'Binomial' |
Choose a Link Function for the Model
GLME models use a link function, g, to map
the relationship between the mean response and the linear combination
of the predictors. By default, fitglme
uses
a predefined, commonly accepted link function based on the specified
distribution of the response data, as shown in the following table.
However, you can specify a different link function from the list of
predefined functions, or define your own, using the 'Link'
name-value
pair argument of fitglme
.
Value | Description |
---|---|
'comploglog' | g(mu) = log(-log(1-mu)) |
'identity' |
Canonical link for the normal distribution. |
'log' |
Canonical link for the Poisson distribution. |
'logit' |
Canonical link for the binomial distribution. |
'loglog' | g(mu) = log(-log(mu)) |
'probit' | g(mu) = norminv(mu) |
'reciprocal' | g(mu) = mu.^(-1) |
Scalar value P | g(mu) = mu.^P |
Structure S | A structure containing four fields whose values are function handles:
If |
When fitting a model to data, fitglme
uses
the canonical link function by default.
Distribution | Default Link Function |
---|---|
'Normal' | 'identity' |
'Binomial' | 'logit' |
'Poisson' | 'log' |
'Gamma' | -1 |
'InverseGaussian' | -2 |
The link functions 'comploglog'
, 'loglog'
,
and 'probit'
are mainly useful for binomial models.
Specify the Model Formula
Model specification for fitglme
uses Wilkinson notation, which
is a character vector or string scalar of the form 'y ~ terms'
,
where y
is the response variable name, and
terms
is written in the following notation.
Wilkinson Notation | Factors in Standard Notation |
---|---|
1 | Constant (intercept) term |
X^k , where k is a positive
integer | X , X2 ,
..., Xk |
X1 + X2 | X1 , X2 |
X1*X2 | X1 , X2 , X1.*X2
(element-wise multiplication of X1 and X2) |
X1:X2 | X1.*X2 only |
- X2 | Do not include X2 |
X1*X2 + X3 | X1 , X2 , X3 , X1*X2 |
X1 + X2 + X3 + X1:X2 | X1 , X2 , X3 , X1*X2 |
X1*X2*X3 - X1:X2:X3 | X1 , X2 , X3 , X1*X2 , X1*X3 , X2*X3 |
X1*(X2 + X3) | X1 , X2 , X3 , X1*X2 , X1*X3 |
Formulas include a constant (intercept) term by default. To
exclude a constant term from the model, include –1
in
the formula.
For generalized linear mixed-effects models, the formula specification
is of the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'
,
where fixed
and random
contain
the fixed-effects and the random-effects terms, respectively.
Suppose the input data table contains the following:
A response variable,
y
Predictor variables,
X1
,X2
, ...,XJ
, where J is the total number of predictor variables (including continuous and grouping variables).Grouping variables,
g1
,g2
, ...,gR
, where R is the number of grouping variables.
The grouping variables in XJ
and gR
can be categorical,
logical, character arrays, string arrays, or cell arrays of character
vectors.
Then, in a formula of the form 'y ~ fixed + (random1|g1)
+ ... + (randomR|gR)'
, the term fixed
corresponds
to a specification of the fixed-effects design matrix X
, random1
is
a specification of the random-effects design matrix Z1
corresponding
to grouping variable g1
, and similarly randomR
is
a specification of the random-effects design matrix ZR
corresponding
to grouping variable gR
. You can express the fixed
and random
terms
using Wilkinson notation as follows.
Formula | Description |
---|---|
'y ~ X1 + X2' | Fixed effects for the intercept, X1 , and X2 .
This formula is equivalent to 'y ~ 1 + X1 + X2' . |
'y ~ -1 + X1 + X2' | No intercept, with fixed effects for X1 and X2 .
The implicit intercept term is suppressed by including -1 . |
'y ~ 1 + (1 | g1)' | A fixed effect for the intercept, plus a random effect for
the intercept for each level of the grouping variable g1 . |
'y ~ X1 + (1 | g1)' | Random intercept model with a fixed slope. |
'y ~ X1 + (X1 | g1)' | Random intercept and slope, with possible correlation between
them. This formula is equivalent to 'y ~ 1 + X1 + (1 + X1|g1)' . |
'y ~ X1 + (1 | g1) + (-1 + X1 | g1)' | Independent random-effects terms for intercept and slope. |
'y ~ 1 + (1 | g1) + (1 | g2) + (1 | g1:g2)' | Random intercept model with independent main effects for g1 and g2 ,
plus an independent interaction effect. |
For example, the sample data mfr
contains
simulated data from a manufacturing company that operates 50 factories
across the world. Each factory runs a batch process to create a finished
product. The company wants to decrease the number of defects in each
batch, so it developed a new manufacturing process. To test the effectiveness
of the new process, the company selected 20 of its factories at random
to participate in an experiment: Ten factories implemented the new
process, while the other ten continued to run the old process. In
each of the 20 factories, the company ran five batches (for a total
of 100 batches), and recorded data on processing time (time_dev
),
temperature (temp_dev
), number of defects (defects
),
and a categorical variable indicating the raw materials supplier (supplier
)
for each batch.
To determine whether the new process (represented by the predictor
variable newprocess
) significantly reduces the
number of defects, fit a GLME model using newprocess
, time_dev
, temp_dev
,
and supplier
as fixed-effects predictors. Include
a random-effects intercept grouped by factory
,
to account for quality differences that might exist due to factory-specific
variations. The response variable defects
has a
Poisson distribution.
The number of defects can be modeled using a Poisson distribution
This corresponds to the generalized linear mixed-effects model
where
defectsij is the number of defects observed in the batch produced by factory i (where i = 1, 2, ..., 20) during batch j (where j = 1, 2, ..., 5).
μij is the mean number of defects corresponding to factory i during batch j.
supplier_Cij and supplier_Bij are dummy variables that indicate whether company
C
orB
, respectively, supplied the process chemicals for the batch produced by factory i during batch j.bi ~ N(0,σb2) is a random-effects intercept for each factory i that accounts for factory-specific variation in quality.
Using Wilkinson notation, specify this model as:
'defects ~ 1 + newprocess + time_dev + temp_dev + supplier
+ (1|factory)'
To account for the Poisson distribution of the response variable,
when fitting the model using fitglme
, specify
the 'Distribution'
name-value pair argument as 'Poisson'
.
By default, fitglme
uses a log link function
for response variables with a Poisson distribution.
Display the Model
The output of the fitting function fitglme
provides
information about generalized linear mixed-effects model.
Using the mfr
manufacturing experiment data,
fit a model using newprocess
, time_dev
, temp_dev
,
and supplier
as fixed-effects predictors. Specify
the response distribution as Poisson, the link function as log, and
the fit method as Laplace.
load mfr glme = fitglme(mfr,... 'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)',... 'Distribution','Poisson','Link','log','FitMethod','Laplace',... 'DummyVarCoding','effects')
glme = Generalized linear mixed-effects model fit by ML Model information: Number of observations 100 Fixed effects coefficients 6 Random effects coefficients 20 Covariance parameters 1 Distribution Poisson Link Log FitMethod Laplace Formula: defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory) Model fit statistics: AIC BIC LogLikelihood Deviance 416.35 434.58 -201.17 402.35 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue '(Intercept)' 1.4689 0.15988 9.1875 94 9.8194e-15 'newprocess' -0.36766 0.17755 -2.0708 94 0.041122 'time_dev' -0.094521 0.82849 -0.11409 94 0.90941 'temp_dev' -0.28317 0.9617 -0.29444 94 0.76907 'supplier_C' -0.071868 0.078024 -0.9211 94 0.35936 'supplier_B' 0.071072 0.07739 0.91836 94 0.36078 Lower Upper 1.1515 1.7864 -0.72019 -0.015134 -1.7395 1.5505 -2.1926 1.6263 -0.22679 0.083051 -0.082588 0.22473 Random effects covariance parameters: Group: factory (20 Levels) Name1 Name2 Type Estimate '(Intercept)' '(Intercept)' 'std' 0.31381 Group: Error Name Estimate 'sqrt(Dispersion)' 1
The Model information
table displays the
total number of observations in the sample data (100), the number
of fixed- and random-effects coefficients (6 and 20, respectively),
and the number of covariance parameters (1). It also indicates that
the response variable has a Poisson
distribution,
the link function is Log
, and the fit method is Laplace
.
Formula
indicates the model specification
using Wilkinson’s notation.
The Model fit statistics
table displays statistics
used to assess the goodness of fit of the model. This includes the
Akaike information criterion (AIC
), Bayesian information
criterion (BIC
) values, log likelihood (LogLikelihood
),
and deviance (Deviance
) values.
The Fixed effects coefficients
table indicates
that fitglme
returned 95% confidence intervals.
It contains one row for each fixed-effects predictor, and each column
contains statistics corresponding to that predictor. Column 1 (Name
)
contains the name of each fixed-effects coefficient, column 2 (Estimate
)
contains its estimated value, and column 3 (SE
)
contains the standard error of the coefficient. Column 4 (tStat
)
contains the t-statistic for a hypothesis test
that the coefficient is equal to 0. Column 5 (DF
)
and column 6 (pValue
) contain the degrees of freedom
and p-value that correspond to the t-statistic,
respectively. The last two columns (Lower
and Upper
)
display the lower and upper limits, respectively, of the 95% confidence
interval for each fixed-effects coefficient.
Random effects covariance parameters
displays
a table for each grouping variable (here, only factory
),
including its total number of levels (20), and the type and estimate
of the covariance parameter. Here, std
indicates
that fitglme
returns the standard deviation of
the random effect associated with the factory predictor, which has
an estimated value of 0.31381. It also displays a table containing
the error parameter type (here, the square root of the dispersion
parameter), and its estimated value of 1.
The standard display generated by fitglme
does
not provide confidence intervals for the random-effects parameters.
To compute and display these values, use covarianceParameters
.
Work with the Model
After you create a GLME model using fitglme
,
you can use additional functions to work with the model.
Inspect and Test Coefficients and Confidence Intervals
To extract estimates of the fixed- and random-effects coefficients, covariance parameters, design matrices, and related statistics:
fixedEffects
extracts estimated fixed-effects coefficients and related statistics from a fitted model. Related statistics include the standard error; the t-statistic, degrees of freedom, and p-value for a hypothesis test of whether each parameter is equal to 0; and the confidence intervals.randomEffects
extracts estimated random-effects coefficients and related statistics from a fitted GLME model. Related statistics include the estimated empirical Bayes predictor (EBP) of each random effect, the square root of the conditional mean squared error of prediction (CMSEP) given the covariance parameters and the response; the t-statistic, estimated degrees of freedom, and p-value for a hypothesis test of whether each random effect is equal to 0; and the confidence intervals.covarianceParameters
extracts estimated covariance parameters and related statistics from a fitted GLME model. Related statistics include estimate of the covariance parameter, and the confidence intervals.designMatrix
extracts the fixed- and random-effects design matrices, or a specified subset thereof, from the fitted GLME model.
To conduct customized hypothesis tests for the significance of fixed- and random-effects coefficients, and to compute custom confidence intervals:
anova
performs a marginal F-test (hypothesis test) on fixed-effects terms, to determine if all coefficients representing the fixed-effects terms are equal to 0. You can useanova
to test the combined significance of the coefficients of categorical predictors.coefCI
computes confidence intervals for fixed- and random-effects parameters from a fitted GLME model. By default,fitglme
computes 95% confidence intervals. UsecoefCI
to compute the boundaries at a different confidence level.coefTest
performs custom hypothesis tests on fixed-effects or random-effects vectors of a fitted generalized linear mixed-effects model. For example, you can specify contrast matrices.
Generate New Response Values and Refit Model
To generate new response values, including fitted, predicted, and random responses, based on the fitted GLME model:
fitted
computes fitted response values using the original predictor values, and the estimated coefficient and parameter values from the fitted model.predict
computes the predicted conditional or marginal mean of the response using either the original predictor values or new predictor values, and the estimated coefficient and parameter values from the fitted model.random
generates random responses from a fitted model.refit
creates a new fitted GLME model, based on the original model and a new response vector.
Inspect and Visualize Residuals
To extract and visualize residuals from the fitted GLME model:
residuals
extracts the raw or Pearson residuals from the fitted model. You can also specify whether to compute the conditional or marginal residuals.plotResiduals
creates plots using the raw or Pearson residuals from the fitted model, including:A histogram of the residuals
A scatterplot of the residuals versus fitted values
A scatterplot of residuals versus lagged residuals
See Also
fitglme
| GeneralizedLinearMixedModel