Main Content

GeneralizedLinearMixedModel Class

Generalized linear mixed-effects model class

Description

A GeneralizedLinearMixedModel object represents a regression model of a response variable that contains both fixed and random effects. The object comprises data, a model description, fitted coefficients, covariance parameters, design matrices, residuals, residual plots, and other diagnostic information for a generalized linear mixed-effects (GLME) model. You can predict model responses with the predict function and generate random data at new design points using the random function.

Construction

You can fit a generalized linear mixed-effects (GLME) model to sample data using fitglme(tbl,formula). For more information, see fitglme.

Input Arguments

expand all

Input data, which includes the response variable, predictor variables, and grouping variables, specified as a table or dataset array. The predictor variables can be continuous or grouping variables (see Grouping Variables). You must specify the model for the variables using formula.

Data Types: table

Formula for model specification, specified as a character vector or string scalar of the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'. For a full description, see Formula.

Example: 'y ~ treatment +(1|block)'

Properties

expand all

Estimates of fixed-effects coefficients and related statistics, stored as a dataset array that has one row for each coefficient and the following columns:

  • Name — Name of the coefficient

  • Estimate — Estimated coefficient value

  • SE — Standard error of the estimate

  • tStatt-statistic for a test that the coefficient is equal to 0

  • DF — Degrees of freedom associated with the t statistic

  • pValuep-value for the t-statistic

  • Lower — Lower confidence limit

  • Upper — Upper confidence limit

To obtain any of these columns as a vector, index into the property using dot notation.

Use the coefTest method to perform other tests on the coefficients.

Covariance of estimated fixed-effects coefficient, stored as a matrix.

Data Types: single | double

Names of fixed-effects coefficients, stored as a cell array of character vectors. The label for the coefficient of the constant term is (Intercept). The labels for other coefficients indicate the terms that they multiply. When the term includes a categorical predictor, the label also indicates the level of that predictor.

Data Types: cell

Degrees of freedom for error, stored as a positive integer value. DFE is the number of observations minus the number of estimated coefficients.

DFE contains the degrees of freedom corresponding to the 'Residual' method of calculating denominator degrees of freedom for hypothesis tests on fixed-effects coefficients. If n is the number of observations and p is the number of fixed-effects coefficients, then DFE is equal to np.

Data Types: double

Model dispersion parameter, stored as a scalar value. The dispersion parameter defines the conditional variance of the response.

For observation i, the conditional variance of the response yi, given the conditional mean μi and the dispersion parameter σ2, in a generalized linear mixed-effects model is

var(yi|μi,σ2)=σ2wiv(μi),

where wi is the ith observation weight and v is the variance function for the specified conditional distribution of the response. The Dispersion property contains an estimate of σ2 for the specified GLME model. The value of Dispersion depends on the specified conditional distribution of the response. For binomial and Poisson distributions, the theoretical value of Dispersion is equal to σ2 = 1.0.

  • If FitMethod is MPL or REMPL and the 'DispersionFlag' name-value pair argument in fitglme is true, then a dispersion parameter is estimated from data for all distributions, including binomial and Poisson distributions.

  • If FitMethod is ApproximateLaplace or Laplace, then the 'DispersionFlag' name-value pair argument in fitglme does not apply, and the dispersion parameter is fixed at 1.0 for binomial and Poisson distributions. For all other distributions, Dispersion is estimated from data.

Data Types: double

Flag indicating estimated dispersion parameter, stored as a logical value.

  • If FitMethod is ApproximateLaplace or Laplace, then the dispersion parameter is fixed at its theoretical value of 1.0 for binomial and Poisson distributions, and DispersionEstimated is false. For other distributions, the dispersion parameter is estimated from the data, and DispersionEstimated is true.

  • If FitMethod is MPL or REMPL, and the 'DispersionFlag' name-value pair argument in fitglme is specified as true, then the dispersion parameter is estimated for all distributions, including binomial and Poisson distributions, and DispersionEstimated is true.

  • If FitMethod is MPL or REMPL, and the 'DispersionFlag' name-value pair argument in fitglme is specified as false, then the dispersion parameter is fixed at its theoretical value for binomial and Poisson distributions, and DispersionEstimated is false. For distributions other than binomial and Poisson, the dispersion parameter is estimated from the data, and DispersionEstimated is true.

Data Types: logical

Response distribution name, stored as one of the following:

  • 'Normal' — Normal distribution

  • 'Binomial' — Binomial distribution

  • 'Poisson' — Poisson distribution

  • 'Gamma' — Gamma distribution

  • 'InverseGaussian' — Inverse Gaussian distribution

Method used to fit the model, stored as one of the following.

  • 'MPL' — Maximum pseudo likelihood

  • 'REMPL' — Restricted maximum pseudo likelihood

  • 'ApproximateLaplace' — Maximum likelihood using the approximate Laplace method, with fixed effects profiled out

  • 'Laplace' — Maximum likelihood using the Laplace method

Model specification formula, stored as an object. The model specification formula uses Wilkinson’s notation to describe the relationship between the fixed-effects terms, random-effects terms, and grouping variables in the GLME model. For more information see Formula.

Log of likelihood function evaluated at the estimated coefficient values, stored as a scalar value. LogLikelihood depends on the method used to fit the model.

  • If you use 'Laplace' or 'ApproximateLaplace', then LogLikelihood is the maximized log likelihood.

  • If you use 'MPL', then LogLikelihood is the maximized log likelihood of the pseudo data from the final pseudo likelihood iteration.

  • If you use 'REMPL', then LogLikelihood is the maximized restricted log likelihood of the pseudo data from the final pseudo likelihood iteration.

Data Types: double

Model criterion to compare fitted generalized linear mixed-effects models, stored as a table with the following fields.

FieldDescription
AICAkaike information criterion
BICBayesian information criterion
LogLikelihood
  • For a model fit using 'Laplace' or 'ApproximateLaplace', LogLikelihood is the maximized log likelihood.

  • For a model fit using 'MPL', LogLikelihood is the maximized log likelihood of the pseudo data from the final pseudo likelihood iteration.

  • For a model fit using 'REMPL', LogLikelihood is the maximized restricted log likelihood of the pseudo data from the final pseudo likelihood iteration.

Deviance–2 times LogLikelihood

Number of fixed-effects coefficients in the fitted generalized linear mixed-effects model, stored as a positive integer value.

Data Types: double

Number of estimated fixed-effects coefficients in the fitted generalized linear mixed-effects model, stored as a positive integer value.

Data Types: double

Number of observations used in the fit, stored as a positive integer value. NumObservations is the number of rows in the table or dataset array tbl, minus rows excluded using the 'Exclude' name-value pair of fitglme or rows containing NaN values.

Data Types: double

Number of variables used as predictors in the generalized linear mixed-effects model, stored as a positive integer value.

Data Types: double

Total number of variables, including the response and predictors, stored as a positive integer value. If the sample data is in a table or dataset array tbl, then NumVariables is the total number of variables in tbl, including the response variable. NumVariables includes variables, if any, that are not used as predictors or as the response.

Data Types: double

Information about the observations used in the fit, stored as a table.

ObservationInfo has one row for each observation and the following columns.

NameDescription
WeightsThe weight value for the observation. The default value is 1.
ExcludedIf the observation was excluded from the fit using the 'Exclude' name-value pair argument in fitglme, then Excluded is true, or 1. Otherwise, Excluded is false, or 0.
Missing

If the observation was excluded from the fit because any response or predictor value is missing, then Missing is true. Otherwise, Missing is false.

Missing values include NaN for numeric variables, empty cells for cell arrays, blank rows for character arrays, and the <undefined> value for categorical arrays.

SubsetIf the observation was used in the fit, then Subset is true. If the observation was not used in the fit because it is missing or excluded, then Subset is false.
BinomSizeBinomial size for each observation. This column only applies when fitting a binomial distribution.

Data Types: table

Names of observations used in the fit, stored as a cell array of character vectors.

  • If the data is in a table or dataset array tbl that contains observation names, then ObservationNames uses those names.

  • If the data is provided in matrices, or in a table or dataset array without observation names, then ObservationNames is an empty cell array.

Data Types: cell

Names of the variables used as predictors in the fit, stored as a cell array of character vectors that has the same length as NumPredictors.

Data Types: cell

Name of the variable used as the response variable in the fit, stored as a character vector.

Data Types: char

Proportion of variability in the response explained by the fitted model, stored as a structure. Rsquared contains the R-squared value of the fitted model, also known as the multiple correlation coefficient. Rsquared contains the following fields.

FieldDescription
OrdinaryR-squared value, stored as a scalar value in a structure.
Rsquared.Ordinary = 1 — SSE./SST
AdjustedR-squared value adjusted for the number of fixed-effects coefficients, stored as a scalar value in a structure.
Rsquared.Adjusted = 1 — (SSE./SST)*(DFT./DFE),
where DFE = n – p, DFT = n – 1, n is the total number of observations, and p is the number of fixed-effects coefficients.

Data Types: struct

Sum of squared errors, specified as a positive scalar. SSE is the weighted sum of the squared conditional residuals, and is calculated as

SSE=i=1Nwieff(yifi)2,

where N is the number of observations, wieff is the ith effective weight, yi is the ith response, and fi is the ith fitted value.

The ith effective weight is calculated as

wieff={wivi(fi(β^,b^))},

where wi is the ith observation weight, vi is the variance term for the ith observation, and β^ and b^ are estimated values of β and b, respectively.

The ith fitted value is calculated as

fi=g1(xiTβ^+ziTb^+δi),

where g is the link function, xiT is the ith row of the fixed-effects design matrix X, ziT is the ith row of the random-effects design matrix Z, and δi is the ith offset value.

Data Types: double

Regression sum of squares, specified as a positive scalar. SSR is the sum of squares explained by the generalized linear mixed-effects regression, and is equal to the sum of the squared deviations between the fitted values and the mean of the response. SSR is calculated as

SSR=i=1Nwieff(fiy¯)2,

where N is the number of observations, wieff is the ith effective weight, fi is the ith fitted value, and y¯ is the weighted average of the response.

The ith effective weight is calculated as

wieff={wivi(fi(β^,b^))},

where wi is the ith observation weight, vi is the variance term for the ith observation, and β^ and b^ are estimated values of β and b, respectively.

The ith fitted value is calculated as

fi=g1(xiTβ^+ziTb^+δi),

where g is the link function, xiT is the ith row of the fixed-effects design matrix X, ziT is the ith row of the random-effects design matrix Z, and δi is the ith offset value.

Data Types: double

Total sum of squares, specified as a positive scalar.

For a GLME model with an intercept, SST is calculated as

SST = SSE + SSR,

where SST is the total sum of squares, SSE is the error sum of squares, and SSR is the regression sum of squares.

For a GLME model without an intercept, SST is calculated as

SST=i=1Nwieff(yiy¯)2,

where N is the number of observations, wieff is the ith effective weight, yi is the ith response value, and y¯ is the weighted average of the response.

Data Types: double

Information about the variables used in the fit, stored as a table. VariableInfo has one row for each variable and contains the following columns.

Column NameDescription
ClassClass of the variable ('double', 'cell', 'nominal', and so on).
Range

Value range of the variable.

  • For a numerical variable, Range is a two-element vector of the form [min,max].

  • For a cell or categorical variable, Range is a cell or categorical array containing all unique values of the variable.

InModel

If the variable is a predictor in the fitted model, InModel is true.

If the variable is not in the fitted model, InModel is false.

IsCategorical

If the variable type is treated as a categorical predictor (such as cell, logical, or categorical), then IsCategorical is true.

If the variable is a continuous predictor, then IsCategorical is false.

Data Types: table

Names of all the variables contained in the table or dataset array tbl, stored as a cell array of character vectors.

Data Types: cell

Variables, stored as a table. If the fit is based on a table or dataset array tbl, then Variables is identical to tbl.

Data Types: table

Object Functions

anovaAnalysis of variance for generalized linear mixed-effects model
coefCIConfidence intervals for coefficients of generalized linear mixed-effects model
coefTestHypothesis test on fixed and random effects of generalized linear mixed-effects model
compareCompare generalized linear mixed-effects models
covarianceParametersExtract covariance parameters of generalized linear mixed-effects model
designMatrixFixed- and random-effects design matrices
fittedFitted responses from generalized linear mixed-effects model
fixedEffectsEstimates of fixed effects and related statistics
partialDependenceCompute partial dependence
plotPartialDependenceCreate partial dependence plot (PDP) and individual conditional expectation (ICE) plots
plotResidualsPlot residuals of generalized linear mixed-effects model
predictPredict response of generalized linear mixed-effects model
randomGenerate random responses from fitted generalized linear mixed-effects model
randomEffectsEstimates of random effects and related statistics
refit Refit generalized linear mixed-effects model
residualsResiduals of fitted generalized linear mixed-effects model
responseResponse vector of generalized linear mixed-effects model

Examples

collapse all

Load the sample data.

load mfr

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data:

  • Flag to indicate whether the batch used the new process (newprocess)

  • Processing time for each batch, in hours (time)

  • Temperature of the batch, in degrees Celsius (temp)

  • Categorical variable indicating the supplier (A, B, or C) of the chemical used in the batch (supplier)

  • Number of defects in the batch (defects)

The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius.

Fit a generalized linear mixed-effects model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects term for intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects', so the dummy variable coefficients sum to 0.

The number of defects can be modeled using a Poisson distribution

defectsijPoisson(μij)

This corresponds to the generalized linear mixed-effects model

log(μij)=β0+β1newprocessij+β2time_devij+β3temp_devij+β4supplier_Cij+β5supplier_Bij+bi,

where

  • defectsij is the number of defects observed in the batch produced by factory i during batch j.

  • μij is the mean number of defects corresponding to factory i (where i=1,2,...,20) during batch j (where j=1,2,...,5).

  • newprocessij, time_devij, and temp_devij are the measurements for each variable that correspond to factory i during batch j. For example, newprocessij indicates whether the batch produced by factory i during batch j used the new process.

  • supplier_Cij and supplier_Bij are dummy variables that use effects (sum-to-zero) coding to indicate whether company C or B, respectively, supplied the process chemicals for the batch produced by factory i during batch j.

  • biN(0,σb2) is a random-effects intercept for each factory i that accounts for factory-specific variation in quality.

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)', ...
    'Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects');

Display the model.

disp(glme)
Generalized linear mixed-effects model fit by ML

Model information:
    Number of observations             100
    Fixed effects coefficients           6
    Random effects coefficients         20
    Covariance parameters                1
    Distribution                    Poisson
    Link                            Log   
    FitMethod                       Laplace

Formula:
    defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory)

Model fit statistics:
    AIC       BIC       LogLikelihood    Deviance
    416.35    434.58    -201.17          402.35  

Fixed effects coefficients (95% CIs):
    Name                   Estimate     SE          tStat       DF    pValue        Lower        Upper    
    {'(Intercept)'}           1.4689     0.15988      9.1875    94    9.8194e-15       1.1515       1.7864
    {'newprocess' }         -0.36766     0.17755     -2.0708    94      0.041122     -0.72019    -0.015134
    {'time_dev'   }        -0.094521     0.82849    -0.11409    94       0.90941      -1.7395       1.5505
    {'temp_dev'   }         -0.28317      0.9617    -0.29444    94       0.76907      -2.1926       1.6263
    {'supplier_C' }        -0.071868    0.078024     -0.9211    94       0.35936     -0.22679     0.083051
    {'supplier_B' }         0.071072     0.07739     0.91836    94       0.36078    -0.082588      0.22473

Random effects covariance parameters:
Group: factory (20 Levels)
    Name1                  Name2                  Type           Estimate
    {'(Intercept)'}        {'(Intercept)'}        {'std'}        0.31381 

Group: Error
    Name                        Estimate
    {'sqrt(Dispersion)'}        1       

The Model information table displays the total number of observations in the sample data (100), the number of fixed- and random-effects coefficients (6 and 20, respectively), and the number of covariance parameters (1). It also indicates that the response variable has a Poisson distribution, the link function is Log, and the fit method is Laplace.

Formula indicates the model specification using Wilkinson’s notation.

The Model fit statistics table displays statistics used to assess the goodness of fit of the model. This includes the Akaike information criterion (AIC), Bayesian information criterion (BIC) values, log likelihood (LogLikelihood), and deviance (Deviance) values.

The Fixed effects coefficients table indicates that fitglme returned 95% confidence intervals. It contains one row for each fixed-effects predictor, and each column contains statistics corresponding to that predictor. Column 1 (Name) contains the name of each fixed-effects coefficient, column 2 (Estimate) contains its estimated value, and column 3 (SE) contains the standard error of the coefficient. Column 4 (tStat) contains the t-statistic for a hypothesis test that the coefficient is equal to 0. Column 5 (DF) and column 6 (pValue) contain the degrees of freedom and p-value that correspond to the t-statistic, respectively. The last two columns (Lower and Upper) display the lower and upper limits, respectively, of the 95% confidence interval for each fixed-effects coefficient.

Random effects covariance parameters displays a table for each grouping variable (here, only factory), including its total number of levels (20), and the type and estimate of the covariance parameter. Here, std indicates that fitglme returns the standard deviation of the random effect associated with the factory predictor, which has an estimated value of 0.31381. It also displays a table containing the error parameter type (here, the square root of the dispersion parameter), and its estimated value of 1.

The standard display generated by fitglme does not provide confidence intervals for the random-effects parameters. To compute and display these values, use covarianceParameters.

Load the carbig sample data set.

load carbig

The variables Acceleration, Model_Year, and Cylinders contain data for car acceleration, year of manufacture, and number of engine cylinders, respectively. The data was collected from cars built between 1970 and 1982.

Create a variable named CylinderCats that indicates whether a car has more than four cylinders. Use the table function to create a table from the data in Acceleration, Model_Year, and CylinderCats.

CylinderCats = Cylinders>4;
tbl = table(Acceleration,Model_Year,CylinderCats);

Fit a generalized mixed-effects model to the data, using CylinderCats as the response variable and Model_Year as a random effect. Specify the response data distribution as binomial.

glme = fitglme(tbl,"CylinderCats~Acceleration+(Acceleration|Model_Year)",Distribution="Binomial");

glme is a GeneralizedLinearMixedModel object that contains information about the fitted model.

Inspect the statistics for the fixed effect Acceleration by using the fixedEffects object function with the default 95% confidence level.

[~,~,statsFixed] = fixedEffects(glme)
statsFixed = 
    FIXED EFFECT COEFFICIENTS: DFMETHOD = 'RESIDUAL', ALPHA = 0.05

    Name                    Estimate    SE          tStat      DF     pValue        Lower       Upper  
    {'(Intercept)' }          4.3838      1.2374     3.5428    404    0.00044213      1.9513     6.8163
    {'Acceleration'}        -0.29673    0.077896    -3.8093    404    0.00016104    -0.44986    -0.1436

The small p-value for the Acceleration term indicates that car acceleration has a statistically significant effect on whether a car has more than four cylinders.

Inspect the statistics for the random effect Model_Year by using the randomEffects object function with the default 95% confidence level.

[~,~,statsRandom] = randomEffects(glme)
statsRandom = 
    RANDOM EFFECT COEFFICIENTS: DFMETHOD = 'RESIDUAL', ALPHA = 0.05

    Group                 Level         Name                    Estimate    SEPred     tStat       DF     pValue      Lower        Upper   
    {'Model_Year'}        {'70'}        {'(Intercept)' }           3.041     2.1322      1.4262    404     0.15457      -1.1506      7.2326
    {'Model_Year'}        {'70'}        {'Acceleration'}        -0.16836    0.13906     -1.2107    404     0.22672     -0.44173     0.10501
    {'Model_Year'}        {'71'}        {'(Intercept)' }          3.4715     2.3452      1.4802    404     0.13959      -1.1389      8.0818
    {'Model_Year'}        {'71'}        {'Acceleration'}        -0.21721    0.15106     -1.4378    404     0.15125     -0.51418    0.079764
    {'Model_Year'}        {'72'}        {'(Intercept)' }          4.2634     2.4382      1.7486    404    0.081124     -0.52977      9.0566
    {'Model_Year'}        {'72'}        {'Acceleration'}        -0.28827    0.15892     -1.8139    404    0.070435      -0.6007    0.024149
    {'Model_Year'}        {'73'}        {'(Intercept)' }          3.7951     2.1976      1.7269    404    0.084949     -0.52512      8.1153
    {'Model_Year'}        {'73'}        {'Acceleration'}        -0.21079    0.14182     -1.4864    404     0.13796     -0.48958    0.067996
    {'Model_Year'}        {'74'}        {'(Intercept)' }        -0.77693     2.6678    -0.29123    404     0.77103      -6.0214      4.4675
    {'Model_Year'}        {'74'}        {'Acceleration'}        0.056863    0.16571     0.34314    404     0.73167      -0.2689     0.38263
    {'Model_Year'}        {'75'}        {'(Intercept)' }         -3.2681     2.1531     -1.5178    404     0.12984      -7.5008     0.96463
    {'Model_Year'}        {'75'}        {'Acceleration'}         0.24151    0.13346      1.8096    404    0.071093    -0.020847     0.50387
    {'Model_Year'}        {'76'}        {'(Intercept)' }        -0.28228     2.0922    -0.13492    404     0.89274      -4.3952      3.8306
    {'Model_Year'}        {'76'}        {'Acceleration'}        0.045966    0.13069     0.35171    404     0.72524     -0.21096     0.30289
    {'Model_Year'}        {'77'}        {'(Intercept)' }        -0.78239     2.2806    -0.34305    404     0.73174      -5.2658       3.701
    {'Model_Year'}        {'77'}        {'Acceleration'}        0.052519    0.14498     0.36226    404     0.71735     -0.23249     0.33752
    {'Model_Year'}        {'78'}        {'(Intercept)' }        -0.46307     2.2693    -0.20406    404     0.83841      -4.9242      3.9981
    {'Model_Year'}        {'78'}        {'Acceleration'}        0.050014    0.14243     0.35114    404     0.72567     -0.22999     0.33002
    {'Model_Year'}        {'79'}        {'(Intercept)' }         -2.5181     2.0134     -1.2507    404     0.21178      -6.4762        1.44
    {'Model_Year'}        {'79'}        {'Acceleration'}         0.19051     0.1257      1.5156    404      0.1304    -0.056591     0.43761
    {'Model_Year'}        {'80'}        {'(Intercept)' }         -2.6168     2.4053     -1.0879    404     0.27728      -7.3452      2.1117
    {'Model_Year'}        {'80'}        {'Acceleration'}         0.10117    0.14903     0.67883    404     0.49763     -0.19181     0.39414
    {'Model_Year'}        {'81'}        {'(Intercept)' }         -1.8396     2.4268    -0.75801    404     0.44888      -6.6103      2.9312
    {'Model_Year'}        {'81'}        {'Acceleration'}         0.08723    0.15145     0.57596    404     0.56497      -0.2105     0.38496
    {'Model_Year'}        {'82'}        {'(Intercept)' }         -2.0238     2.5531    -0.79267    404     0.42843      -7.0428      2.9953
    {'Model_Year'}        {'82'}        {'Acceleration'}        0.058853    0.15948     0.36903    404      0.7123     -0.25467     0.37237

The large p-values in the table output indicate that not enough evidence exists to conclude that any of the random effect terms have a statistically significant effect on whether a car has more than four cylinders.

More About

expand all