fitlm

Fit linear regression model using design points

Since R2024b

Syntax

mdl = fitlm(dobj,Y)

mdl = fitlm(dobj,Y,modelspec)

mdl = fitlm(___,Name=Value)

Description

mdl = fitlm(dobj,Y) returns a linear regression model fit to the design points in dobj and the response data in Y.

example

mdl = fitlm(dobj,Y,modelspec) also defines the model specification.

example

mdl = fitlm(___,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in the previous syntaxes. For example, you can specify robust fitting options and observations to exclude from the fit.

example

Examples

collapse all

Fit Linear Regression Model

Open Live Script

Generate a D-optimal design and some response data for the design points.

dopt = optimalDOE(5,20);

pts = dopt.Design;
h = height(pts);
response = 2*pts.Factor1+3*pts.Factor2+pts.Factor3+0.01*randn(h,1);

dopt is an optimalDOE object that contains information about the generated D-optimal design. response is a vector of response data.

Fit a linear model using the design points in dopt as the predictor data and response as the response data.

mdl = fitlm(dopt,response)

mdl = 
Linear regression model:
    y ~ 1 + Factor1 + Factor2 + Factor3 + Factor4 + Factor5

Estimated Coefficients:
                    Estimate         SE         tStat        pValue  
                   ___________    _________    ________    __________

    (Intercept)    -0.00045125     0.001709    -0.26404        0.7956
    Factor1             2.0012     0.001709        1171    2.4234e-36
    Factor2             2.9954     0.001709      1752.7    8.5516e-39
    Factor3            0.99767    0.0017443      571.97    5.5036e-32
    Factor4         -0.0020275    0.0017443     -1.1624       0.26452
    Factor5          0.0016909     0.001709      0.9894       0.33926


Number of observations: 20, Error degrees of freedom: 14
Root Mean Squared Error: 0.00764
R-squared: 1,  Adjusted R-Squared: 1
F-statistic vs. constant model: 9.57e+05, p-value = 3.27e-38

mdl is a LinearModel object that contains the results of fitting a linear model to the data. The model display includes the model formula, estimated coefficients, and model summary statistics.

Specify Model for Linear Regression

Open Live Script

Generate a mixture design and create some response data for the design points.

dmix = mixtureDOE(3);

pts = dmix.Design;
h = height(pts);
Y = 2*pts.Factor1+pts.Factor2.*pts.Factor3+5+0.001*randn(h,1);

dmix is a mixtureDOE object that contains information about the generated mixture design. The vector Y contains response data.

Fit a linear model using the design points in dmix as the predictor data and Y as the response data. Specify the experiment model to fit.

mdl = fitlm(dmix,Y,"y~Factor1+Factor2:Factor3")

mdl = 
Linear regression model:
    y ~ 1 + Factor1 + Factor2:Factor3

Estimated Coefficients:
                       Estimate        SE        tStat       pValue  
                       ________    __________    ______    __________

    (Intercept)         4.9999     0.00096562    5177.9    8.3471e-15
    Factor1             2.0009      0.0017544    1140.5    3.5465e-12
    Factor2:Factor3    0.99501      0.0067547    147.31    1.2739e-08


Number of observations: 7, Error degrees of freedom: 4
Root Mean Squared Error: 0.00148
R-squared: 1,  Adjusted R-Squared: 1
F-statistic vs. constant model: 7e+05, p-value = 8.16e-12

mdl is a LinearModel object that contains the results of fitting the experiment model to the data. The values in the pValue column suggest that each term in the model has a statistically significant effect on the response.

Fit Linear Regression Model Without Intercept

Open Live Script

Generate a full factorial design and create some response data for the design points.

dff = fullFactorialDOE(3);

pts = dff.Design;
h = height(pts);
Y = 2*pts.Factor1+3*pts.Factor2+pts.Factor3+0.01*randn(h,1);

dff is a fullFactorialDOE object that contains information about the generated full factorial design. The vector Y contains response data.

Fit a linear model using the design points in dff as the predictor data and Y as the response data.

mdl1 = fitlm(dff,Y)

mdl1 = 
Linear regression model:
    y ~ 1 + Factor1 + Factor2 + Factor3

Estimated Coefficients:
                    Estimate         SE          tStat        pValue  
                   ___________    _________    _________    __________

    (Intercept)    -0.00013126    0.0051434    -0.025521       0.98086
    Factor1             1.9974    0.0051434       388.34    2.6379e-10
    Factor2             2.9964    0.0051434       582.57     5.209e-11
    Factor3             1.0045    0.0051434       195.29    4.1244e-09


Number of observations: 8, Error degrees of freedom: 4
Root Mean Squared Error: 0.0145
R-squared: 1,  Adjusted R-Squared: 1
F-statistic vs. constant model: 1.76e+05, p-value = 1.07e-10

mdl1 is a LinearModel object that contains the results of fitting a linear model to the data. The model display includes the model formula, estimated coefficients, and model summary statistics. The large p-value for the intercept term indicates it does not have a statistically significant effect on the response.

Fit a linear model without an intercept term to the data.

mdl2 = fitlm(dff,Y,Intercept=0)

mdl2 = 
Linear regression model:
    y ~ Factor1 + Factor2 + Factor3

Estimated Coefficients:
               Estimate       SE        tStat       pValue  
               ________    _________    ______    __________

    Factor1     1.9974     0.0046008    434.15    1.2305e-12
    Factor2     2.9964     0.0046008    651.28    1.6198e-13
    Factor3     1.0045     0.0046008    218.32    3.8258e-11


Number of observations: 8, Error degrees of freedom: 5
Root Mean Squared Error: 0.013

mdl2 contains the results of fitting a linear model without an intercept term.

Inspect the loglikelihood of each model.

loglikelihoods = [mdl1.LogLikelihood,mdl2.LogLikelihood]

loglikelihoods = 1×2

   25.2636   25.2629

The output shows that mdl1 has a slightly larger loglikelihood than mdl2. This result indicates that removing the intercept term does not have a significant effect on how well the model fits the data.

Input Arguments

collapse all

`dobj` — Design
`fullFactorialDOE` object | `mixtureDOE` object | `optimalDOE` object

Design, specified as a fullFactorialDOE, mixtureDOE, or optimalDOE object. fitlm fits the linear regression model using the design points in dobj.Design as predictors.

`Y` — Response variable
numeric vector

Response variable, specified as a p-by-1 numeric vector, where p is the number of design points in dobj. Each entry in Y is the response for the corresponding row of dobj.Design.

Data Types: single | double

`modelspec` — Experiment model
string scalar | character vector | terms matrix

Experiment model, specified as one of the following values.

A character vector or string scalar with the model name.

Value	Model Description
`"linear"`	The model contains an intercept and linear term for each factor.
`"constant"`	The model contains only a constant (intercept) term.
`"interactions"`	The model contains an intercept, linear term for each factor, and all products of pairs of distinct factors (no squared terms).
`"purequadratic"`	The model contains an intercept term, and linear and squared terms for each factor.
`"quadratic"`	The model contains an intercept term, linear and squared terms for each factor, and all products of pairs of distinct factors.
`"scheffe-linear"`	The model contains a linear term for each factor and does not include an intercept term.
`"scheffe-quad"`	The model is given by the formula: $\sum_{i = 1}^{n} b_{i} x_{i} + \sum_{i = 1}^{n} \sum_{j < i}^{n - 1} b_{i j} x_{i} x_{j}$
`"scheffe-special-cubic"`	The model is given by the formula: $\sum_{i = 1}^{n} b_{i} x_{i} + \sum_{i = 1}^{n} \sum_{j < i}^{n - 1} b_{i j} x_{i} x_{j} + \sum_{i = 1}^{n} \sum_{j < i}^{n - 1} \sum_{k < j}^{n - 2} b_{i j k} x_{i} x_{j} x_{k}$
`"polyijk"`	The model is a polynomial with all terms up to degree `i` in the first factor, degree `j` in the second factor, and so on. Specify the maximum degree for each factor by using numerals 0 though 9. The model contains interaction terms, but the degree of each interaction term does not exceed the maximum value of the specified degrees. For example, `"poly13"` has an intercept and x₁, x₂, x₂², x₂³, x₁x₂, and x₁x₂² terms, where x₁ and x₂ are the first and second factors, respectively.

In the above table, each x_i corresponds to the ith factor in the design, and b_i, b_ij, b_ijk, and d_ij are coefficients for the model terms.

A character vector or string scalar formula in Wilkinson Notation. The factor names in the formula must be factor names specified by the FactorNames name-value argument when you create dobj.
A t-by-n terms matrix, where t is the number of terms and n is the number of factors in the design. A terms matrix is convenient when the number of factors is large and you want to generate the terms programmatically. For more information about terms matrices, see Terms Matrix.

The default value for modelspec is dobj.ModelSpecification.

Example: "quadratic"

Example: "x1 + x2^2 + x1:x2"

Data Types: single | double | char | string

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: fitlm(dobj,Y,Intercept=false,ResponseVar="OxygenLevel") fits a linear model without an intercept to the predictor data dobj and the response variable "OxygenLevel" in Y.

`Exclude` — Observations to exclude
numeric vector | logical vector

Observations to exclude from the fit, specified as a numeric or logical vector. The elements of the vector indicate which rows in dobj.Design to exclude from the fit.

Example: Exclude=[2,3]

Example: Exclude=logical([0 1 1 0 0 0])

Data Types: single | double | logical

`Intercept` — Indicator for constant term
`true` or `1` | `false` or `0`

Indicator for the constant term (intercept) in the fit, specified as a numeric or logical 1 (true) to include the constant term, or 0 (false) to remove the constant term from the model. The default value for intercept is true if dobj.ModelSpecification contains an intercept. Otherwise, the default value is false.

You can specify Intercept only when modelspec is a model name.

Example: Intercept=false

Data Types: logical

`ResponseVar` — Response variable name
string | character vector

Response variable name, specified as a string or a character vector. The default value for ResponseVar is "y".

Example: ResponseVar="yield"

Data Types: char | string

`RobustOpts` — Type of robust fitting
`"off"` (default) | `"on"` | character vector | string scalar | structure

Type of robust fitting to use, specified as one of these values:

"off" — No robust fitting. fitlm uses ordinary least squares.
"on" — Robust fitting using the "bisquare" weight function with the default tuning constant.
Character vector or string scalar — Name of a robust fitting weight function from the following table. fitlm uses the corresponding default tuning constant.
Structure with the two fields RobustWgtFun and Tune.
- The RobustWgtFun field contains the name of a robust fitting weight function from the following table, or the function handle of a custom weight function.
- The Tune field contains a tuning constant. If you do not set the Tune field, fitlm uses the corresponding default tuning constant.

Weight Function	Description	Default Tuning Constant
`"andrews"`	`w = (abs(r)<pi) .* sin(r) ./ r`	1.339
`"bisquare"`	`w = (abs(r)<1) .* (1 - r.^2).^2` (also called biweight)	4.685
`"cauchy"`	`w = 1 ./ (1 + r.^2)`	2.385
`"fair"`	`w = 1 ./ (1 + abs(r))`	1.400
`"huber"`	`w = 1 ./ max(1, abs(r))`	1.345
`"logistic"`	`w = tanh(r) ./ r`	1.205
`"ols"`	Ordinary least squares (no weight function)	None
`"talwar"`	`w = 1 * (abs(r)<1)`	2.795
`"welsch"`	`w = exp(-(r.^2))`	2.985
function handle	Custom weight function that accepts a vector `r` of scaled residuals, and returns a vector of weights the same size as `r`	1

The default tuning constants of built-in weight functions give coefficient estimates that are approximately 95% as statistically efficient as the ordinary least-squares estimates, provided that the response has a normal distribution with no outliers. Decreasing the tuning constant increases the downweight assigned to large residuals. Increasing the tuning constant decreases the downweight assigned to large residuals.

The value r in the weight functions is determined by

r = resid/(tune*s*sqrt(1–h)),

where resid is the vector of residuals from the previous iteration, tune is the tuning constant, and h is the vector of leverage values from a least-squares fit. s is an estimate of the standard deviation of the error term given by

s = MAD/0.6745.

MAD is the median absolute deviation of the residuals from their median. The constant 0.6745 makes the estimate unbiased for the normal distribution. If X has p columns, the software excludes the smallest p absolute deviations when computing the median.

For robust fitting, fitlm uses M-estimation to formulate estimating equations, and solves them using the method Iteratively Reweighted Least Squares (IRLS).

Example: RobustOpts="andrews"

Data Types: char | string | struct

`Weights` — Observation weights
`ones(p,1)` (default) | p-by-1 vector of nonnegative scalar values

Observation weights, specified as a p-by-1 vector of nonnegative scalar values, where p is the number of design points.

Data Types: single | double

Output Arguments

collapse all

`mdl` — Fitted model
`LinearModel` object

Fitted model, returned as a LinearModel object.

If you do not set the RobustOpts name-value argument, or specify it as "ols", the model is a least-squares fit. Otherwise, fitlm fits the model using the robust fitting function specified by RobustOpts.

More About

collapse all

Terms Matrix

A terms matrix T is a t-by-n matrix specifying the terms in a model, where t is the number of terms, and n is the number of factors in the design. The value of T(i,j) is the exponent of variable j in term i.

For example, suppose that a design includes three factors x1, x2, and x3. Each row of T represents one term:

[0 0 0] — Constant term or intercept
[0 1 0] — x2; equivalently, x1^0 * x2^1 * x3^0
[1 0 1] — x1*x3
[2 0 0] — x1^2
[0 1 2] — x2*(x3^2)

Wilkinson Notation

Wilkinson notation describes the terms in a model. The notation relates to the terms included in the model, not to the multipliers (coefficients) of those terms.

Wilkinson notation uses these symbols:

+ means include the next variable.
– means do not include the next variable.
: defines an interaction, which is a product of the terms.
* defines an interaction and all lower order terms.
^ raises the predictor to a power, exactly as in * repeated, so ^ includes lower order terms as well.
() groups the terms.

This table shows typical examples of Wilkinson notation.

Wilkinson Notation	Terms in Standard Notation
`1`	Constant (intercept) term
`x1^k`, where `k` is a positive integer	`x1`, `x1²`, ..., `x1^k`
`x1 + x2`	`x1`, `x2`
`x1*x2`	`x1`, `x2`, `x1*x2`
`x1:x2`	`x1*x2` only
`–x2`	Do not include `x2`
`x1*x2 + x3`	`x1`, `x2`, `x3`, `x1*x2`
`x1 + x2 + x3 + x1:x2`	`x1`, `x2`, `x3`, `x1*x2`
`x1x2x3 – x1:x2:x3`	`x1`, `x2`, `x3`, `x1x2`, `x1x3`, `x2*x3`
`x1*(x2 + x3)`	`x1`, `x2`, `x3`, `x1x2`, `x1x3`

For more details, see Wilkinson Notation.

Version History

Introduced in R2024b

fitlm

Syntax

Description

Examples

Fit Linear Regression Model

Specify Model for Linear Regression

Fit Linear Regression Model Without Intercept

Input Arguments

dobj — Design fullFactorialDOE object | mixtureDOE object | optimalDOE object

Y — Response variable numeric vector

modelspec — Experiment model string scalar | character vector | terms matrix

Name-Value Arguments

Exclude — Observations to exclude numeric vector | logical vector

Intercept — Indicator for constant term true or 1 | false or 0

ResponseVar — Response variable name string | character vector

RobustOpts — Type of robust fitting "off" (default) | "on" | character vector | string scalar | structure

Weights — Observation weights ones(p,1) (default) | p-by-1 vector of nonnegative scalar values

Output Arguments

mdl — Fitted model LinearModel object

More About

Terms Matrix

Wilkinson Notation

Version History

See Also

`dobj` — Design
`fullFactorialDOE` object | `mixtureDOE` object | `optimalDOE` object

`Y` — Response variable
numeric vector

`modelspec` — Experiment model
string scalar | character vector | terms matrix

`Exclude` — Observations to exclude
numeric vector | logical vector

`Intercept` — Indicator for constant term
`true` or `1` | `false` or `0`

`ResponseVar` — Response variable name
string | character vector

`RobustOpts` — Type of robust fitting
`"off"` (default) | `"on"` | character vector | string scalar | structure

`Weights` — Observation weights
`ones(p,1)` (default) | p-by-1 vector of nonnegative scalar values

`mdl` — Fitted model
`LinearModel` object