Main Content

fitlm

Fit linear regression model using design points

Since R2024b

    Description

    mdl = fitlm(dobj,Y) returns a linear regression model fit to the design points in dobj and the response data in Y.

    example

    mdl = fitlm(dobj,Y,modelspec) also defines the model specification.

    example

    mdl = fitlm(___,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in the previous syntaxes. For example, you can specify robust fitting options and observations to exclude from the fit.

    example

    Examples

    collapse all

    Generate a D-optimal design and some response data for the design points.

    dopt = optimalDOE(5,20);
    
    pts = dopt.Design;
    h = height(pts);
    response = 2*pts.Factor1+3*pts.Factor2+pts.Factor3+0.01*randn(h,1);

    dopt is an optimalDOE object that contains information about the generated D-optimal design. response is a vector of response data.

    Fit a linear model using the design points in dopt as the predictor data and response as the response data.

    mdl = fitlm(dopt,response)
    mdl = 
    Linear regression model:
        y ~ 1 + Factor1 + Factor2 + Factor3 + Factor4 + Factor5
    
    Estimated Coefficients:
                        Estimate         SE         tStat        pValue  
                       ___________    _________    ________    __________
    
        (Intercept)    -0.00045125     0.001709    -0.26404        0.7956
        Factor1             2.0012     0.001709        1171    2.4234e-36
        Factor2             2.9954     0.001709      1752.7    8.5516e-39
        Factor3            0.99767    0.0017443      571.97    5.5036e-32
        Factor4         -0.0020275    0.0017443     -1.1624       0.26452
        Factor5          0.0016909     0.001709      0.9894       0.33926
    
    
    Number of observations: 20, Error degrees of freedom: 14
    Root Mean Squared Error: 0.00764
    R-squared: 1,  Adjusted R-Squared: 1
    F-statistic vs. constant model: 9.57e+05, p-value = 3.27e-38
    

    mdl is a LinearModel object that contains the results of fitting a linear model to the data. The model display includes the model formula, estimated coefficients, and model summary statistics.

    Generate a mixture design and create some response data for the design points.

    dmix = mixtureDOE(3);
    
    pts = dmix.Design;
    h = height(pts);
    Y = 2*pts.Factor1+pts.Factor2.*pts.Factor3+5+0.001*randn(h,1);

    dmix is a mixtureDOE object that contains information about the generated mixture design. The vector Y contains response data.

    Fit a linear model using the design points in dmix as the predictor data and Y as the response data. Specify the experiment model to fit.

    mdl = fitlm(dmix,Y,"y~Factor1+Factor2:Factor3")
    mdl = 
    Linear regression model:
        y ~ 1 + Factor1 + Factor2:Factor3
    
    Estimated Coefficients:
                           Estimate        SE        tStat       pValue  
                           ________    __________    ______    __________
    
        (Intercept)         4.9999     0.00096562    5177.9    8.3471e-15
        Factor1             2.0009      0.0017544    1140.5    3.5465e-12
        Factor2:Factor3    0.99501      0.0067547    147.31    1.2739e-08
    
    
    Number of observations: 7, Error degrees of freedom: 4
    Root Mean Squared Error: 0.00148
    R-squared: 1,  Adjusted R-Squared: 1
    F-statistic vs. constant model: 7e+05, p-value = 8.16e-12
    

    mdl is a LinearModel object that contains the results of fitting the experiment model to the data. The values in the pValue column suggest that each term in the model has a statistically significant effect on the response.

    Generate a full factorial design and create some response data for the design points.

    dff = fullFactorialDOE(3);
    
    pts = dff.Design;
    h = height(pts);
    Y = 2*pts.Factor1+3*pts.Factor2+pts.Factor3+0.01*randn(h,1);

    dff is a fullFactorialDOE object that contains information about the generated full factorial design. The vector Y contains response data.

    Fit a linear model using the design points in dff as the predictor data and Y as the response data.

    mdl1 = fitlm(dff,Y)
    mdl1 = 
    Linear regression model:
        y ~ 1 + Factor1 + Factor2 + Factor3
    
    Estimated Coefficients:
                        Estimate         SE          tStat        pValue  
                       ___________    _________    _________    __________
    
        (Intercept)    -0.00013126    0.0051434    -0.025521       0.98086
        Factor1             1.9974    0.0051434       388.34    2.6379e-10
        Factor2             2.9964    0.0051434       582.57     5.209e-11
        Factor3             1.0045    0.0051434       195.29    4.1244e-09
    
    
    Number of observations: 8, Error degrees of freedom: 4
    Root Mean Squared Error: 0.0145
    R-squared: 1,  Adjusted R-Squared: 1
    F-statistic vs. constant model: 1.76e+05, p-value = 1.07e-10
    

    mdl1 is a LinearModel object that contains the results of fitting a linear model to the data. The model display includes the model formula, estimated coefficients, and model summary statistics. The large p-value for the intercept term indicates it does not have a statistically significant effect on the response.

    Fit a linear model without an intercept term to the data.

    mdl2 = fitlm(dff,Y,Intercept=0)
    mdl2 = 
    Linear regression model:
        y ~ Factor1 + Factor2 + Factor3
    
    Estimated Coefficients:
                   Estimate       SE        tStat       pValue  
                   ________    _________    ______    __________
    
        Factor1     1.9974     0.0046008    434.15    1.2305e-12
        Factor2     2.9964     0.0046008    651.28    1.6198e-13
        Factor3     1.0045     0.0046008    218.32    3.8258e-11
    
    
    Number of observations: 8, Error degrees of freedom: 5
    Root Mean Squared Error: 0.013
    

    mdl2 contains the results of fitting a linear model without an intercept term.

    Inspect the loglikelihood of each model.

    loglikelihoods = [mdl1.LogLikelihood,mdl2.LogLikelihood]
    loglikelihoods = 1×2
    
       25.2636   25.2629
    
    

    The output shows that mdl1 has a slightly larger loglikelihood than mdl2. This result indicates that removing the intercept term does not have a significant effect on how well the model fits the data.

    Input Arguments

    collapse all

    Design, specified as a fullFactorialDOE, mixtureDOE, or optimalDOE object. fitlm fits the linear regression model using the design points in dobj.Design as predictors.

    Response variable, specified as a p-by-1 numeric vector, where p is the number of design points in dobj. Each entry in Y is the response for the corresponding row of dobj.Design.

    Data Types: single | double

    Experiment model, specified as one of the following values.

    • A character vector or string scalar with the model name.

      ValueModel Description
      "linear"The model contains an intercept and linear term for each factor.
      "constant"The model contains only a constant (intercept) term.
      "interactions"The model contains an intercept, linear term for each factor, and all products of pairs of distinct factors (no squared terms).
      "purequadratic"The model contains an intercept term, and linear and squared terms for each factor.
      "quadratic"The model contains an intercept term, linear and squared terms for each factor, and all products of pairs of distinct factors.
      "scheffe-linear"

      The model contains a linear term for each factor and does not include an intercept term.

      "scheffe-quad"

      The model is given by the formula:

      i=1nbixi+i=1nj<in1bijxixj

      "scheffe-special-cubic"

      The model is given by the formula:

      i=1nbixi+i=1nj<in1bijxixj+i=1nj<in1k<jn2bijkxixjxk

      "polyijk"The model is a polynomial with all terms up to degree i in the first factor, degree j in the second factor, and so on. Specify the maximum degree for each factor by using numerals 0 though 9. The model contains interaction terms, but the degree of each interaction term does not exceed the maximum value of the specified degrees. For example, "poly13" has an intercept and x1, x2, x22, x23, x1*x2, and x1*x22 terms, where x1 and x2 are the first and second factors, respectively.

      In the above table, each xi corresponds to the ith factor in the design, and bi, bij, bijk, and dij are coefficients for the model terms.

    • A character vector or string scalar formula in Wilkinson Notation. The factor names in the formula must be factor names specified by the FactorNames name-value argument when you create dobj.

    • A t-by-n terms matrix, where t is the number of terms and n is the number of factors in the design. A terms matrix is convenient when the number of factors is large and you want to generate the terms programmatically. For more information about terms matrices, see Terms Matrix.

    The default value for modelspec is dobj.ModelSpecification.

    Example: "quadratic"

    Example: "x1 + x2^2 + x1:x2"

    Data Types: single | double | char | string

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: fitlm(dobj,Y,Intercept=false,ResponseVar="OxygenLevel") fits a linear model without an intercept to the predictor data dobj and the response variable "OxygenLevel" in Y.

    Observations to exclude from the fit, specified as a numeric or logical vector. The elements of the vector indicate which rows in dobj.Design to exclude from the fit.

    Example: Exclude=[2,3]

    Example: Exclude=logical([0 1 1 0 0 0])

    Data Types: single | double | logical

    Indicator for the constant term (intercept) in the fit, specified as a numeric or logical 1 (true) to include the constant term, or 0 (false) to remove the constant term from the model. The default value for intercept is true if dobj.ModelSpecification contains an intercept. Otherwise, the default value is false.

    You can specify Intercept only when modelspec is a model name.

    Example: Intercept=false

    Data Types: logical

    Response variable name, specified as a string or a character vector. The default value for ResponseVar is "y".

    Example: ResponseVar="yield"

    Data Types: char | string

    Type of robust fitting to use, specified as one of these values:

    • "off" — No robust fitting. fitlm uses ordinary least squares.

    • "on" — Robust fitting using the "bisquare" weight function with the default tuning constant.

    • Character vector or string scalar — Name of a robust fitting weight function from the following table. fitlm uses the corresponding default tuning constant.

    • Structure with the two fields RobustWgtFun and Tune.

      • The RobustWgtFun field contains the name of a robust fitting weight function from the following table, or the function handle of a custom weight function.

      • The Tune field contains a tuning constant. If you do not set the Tune field, fitlm uses the corresponding default tuning constant.

    Weight FunctionDescriptionDefault Tuning Constant
    "andrews"w = (abs(r)<pi) .* sin(r) ./ r1.339
    "bisquare"w = (abs(r)<1) .* (1 - r.^2).^2 (also called biweight)4.685
    "cauchy"w = 1 ./ (1 + r.^2)2.385
    "fair"w = 1 ./ (1 + abs(r))1.400
    "huber"w = 1 ./ max(1, abs(r))1.345
    "logistic"w = tanh(r) ./ r1.205
    "ols"Ordinary least squares (no weight function)None
    "talwar"w = 1 * (abs(r)<1)2.795
    "welsch"w = exp(-(r.^2))2.985
    function handleCustom weight function that accepts a vector r of scaled residuals, and returns a vector of weights the same size as r1

    The default tuning constants of built-in weight functions give coefficient estimates that are approximately 95% as statistically efficient as the ordinary least-squares estimates, provided that the response has a normal distribution with no outliers. Decreasing the tuning constant increases the downweight assigned to large residuals. Increasing the tuning constant decreases the downweight assigned to large residuals.

    The value r in the weight functions is determined by

    r = resid/(tune*s*sqrt(1–h)),

    where resid is the vector of residuals from the previous iteration, tune is the tuning constant, and h is the vector of leverage values from a least-squares fit. s is an estimate of the standard deviation of the error term given by

    s = MAD/0.6745.

    MAD is the median absolute deviation of the residuals from their median. The constant 0.6745 makes the estimate unbiased for the normal distribution. If X has p columns, the software excludes the smallest p absolute deviations when computing the median.

    For robust fitting, fitlm uses M-estimation to formulate estimating equations, and solves them using the method Iteratively Reweighted Least Squares (IRLS).

    Example: RobustOpts="andrews"

    Data Types: char | string | struct

    Observation weights, specified as a p-by-1 vector of nonnegative scalar values, where p is the number of design points.

    Data Types: single | double

    Output Arguments

    collapse all

    Fitted model, returned as a LinearModel object.

    If you do not set the RobustOpts name-value argument, or specify it as "ols", the model is a least-squares fit. Otherwise, fitlm fits the model using the robust fitting function specified by RobustOpts.

    More About

    collapse all

    Terms Matrix

    A terms matrix T is a t-by-n matrix specifying the terms in a model, where t is the number of terms, and n is the number of factors in the design. The value of T(i,j) is the exponent of variable j in term i.

    For example, suppose that a design includes three factors x1, x2, and x3. Each row of T represents one term:

    • [0 0 0] — Constant term or intercept

    • [0 1 0]x2; equivalently, x1^0 * x2^1 * x3^0

    • [1 0 1]x1*x3

    • [2 0 0]x1^2

    • [0 1 2]x2*(x3^2)

    Wilkinson Notation

    Wilkinson notation describes the terms in a model. The notation relates to the terms included in the model, not to the multipliers (coefficients) of those terms.

    Wilkinson notation uses these symbols:

    • + means include the next variable.

    • means do not include the next variable.

    • : defines an interaction, which is a product of the terms.

    • * defines an interaction and all lower order terms.

    • ^ raises the predictor to a power, exactly as in * repeated, so ^ includes lower order terms as well.

    • () groups the terms.

    This table shows typical examples of Wilkinson notation.

    Wilkinson NotationTerms in Standard Notation
    1Constant (intercept) term
    x1^k, where k is a positive integerx1, x12, ..., x1k
    x1 + x2x1, x2
    x1*x2x1, x2, x1*x2
    x1:x2x1*x2 only
    –x2Do not include x2
    x1*x2 + x3x1, x2, x3, x1*x2
    x1 + x2 + x3 + x1:x2x1, x2, x3, x1*x2
    x1*x2*x3 – x1:x2:x3x1, x2, x3, x1*x2, x1*x3, x2*x3
    x1*(x2 + x3)x1, x2, x3, x1*x2, x1*x3

    For more details, see Wilkinson Notation.

    Version History

    Introduced in R2024b