Main Content

stats

Analysis of variance (ANOVA) table

Since R2022b

    Description

    s = stats(aov) returns a component ANOVA table for the anova object aov. The component ANOVA table contains statistics for the model terms, error, and total. For more information, see s.

    s = stats(aov,type) specifies whether to return a component or summary ANOVA table. The summary ANOVA table includes summary statistics for the linear and nonlinear model terms, regression, error, and total. For more information, see s.

    example

    s = stats(aov,Component=sstype) specifies the sum of squares type used to create the component table.

    [s,ems] = stats(___) also returns a table of information about the expected mean squares ems for each term and the error. If you specify the sstype in the call to stats, then the software creates the ems table with the specified sum of squares type.

    example

    Examples

    collapse all

    Load popcorn yield data.

    load popcorn.mat

    The columns of the 6-by-3 matrix popcorn contain popcorn yield observations in cups for the brands Gourmet, National, and Generic. The first three rows of popcorn correspond to popcorn that was popped using an air popper and the last three rows correspond to popcorn popped in oil.

    Create string arrays of factor values for the brand and type of popper using the repmat function.

    brand = [repmat("Gourmet",6,1); repmat("National",6,1); repmat("Generic",6,1)];
    popperType = repmat(["Air";"Air";"Air";"Oil";"Oil";"Oil"], [3, 1]);
    factors = {brand,popperType};

    Perform a two-way ANOVA to test the null hypothesis that the mean popcorn yield is not affected by the brand of popcorn and popper type.

    aov = anova(factors,popcorn(:),FactorNames=["Brand","PopperType"],ModelSpecification="interactions")
    aov = 
    2-way anova, constrained (Type III) sums of squares.
    
    Y ~ 1 + Brand*PopperType
    
                            SumOfSquares    DF    MeanSquares     F        pValue  
                            ____________    __    ___________    ____    __________
    
        Brand                    15.75       2        7.875      56.7     7.679e-07
        PopperType                 4.5       1          4.5      32.4    0.00010037
        Brand:PopperType      0.083333       2     0.041667       0.3       0.74622
        Error                   1.6667      12      0.13889                        
        Total                       22      17                                     
    
    
      Properties, Methods
    
    
    

    By default, anova displays a component ANOVA table.

    Generate a summary ANOVA table.

    s = stats(aov,"summary")
    s=5×5 table
                      SumOfSquares    DF    MeanSquares      F        pValue  
                      ____________    __    ___________    _____    __________
    
        Linear             20.25       3         6.75       48.6    5.4835e-07
        NonLinear       0.083333       2     0.041667        0.3       0.74622
        Regression        20.333       5       4.0667      29.28    2.5065e-06
        Error             1.6667      12      0.13889                         
        Total                 22      17       1.2941                         
    
    

    The row Linear corresponds to the terms Brand and PopperType in the ANOVA model. The small p-value in the Linear row indicates that Brand and PopperType have a statistically significant combined effect on the popcorn yield. The row NonLinear corresponds to the term Brand:PopperType. The large p-value in the NonLinear row indicates that the interaction term does not have a statistically significant effect on the popcorn yield. The small p-value in the row Regression indicates that the ANOVA model is a better predictor of the response data than the mean of the data.

    Load the sample car data.

    load carsmall

    Data for the country of origin, model year, and mileage is stored in the variables Origin, Model_Year, and MPG, respectively.

    Perform a two-way ANOVA to test the null hypothesis that mean mileage is not affected by the country of origin or model year.

    aov = anova({Origin, Model_Year},MPG,RandomFactors=[1 2],FactorNames=["Origin" "Year"])
    aov = 
    2-way anova, constrained (Type III) sums of squares.
    
    Y ~ 1 + Origin + Year
    
                  SumOfSquares    DF    MeanSquares      F         pValue  
                  ____________    __    ___________    ______    __________
    
        Origin       1078.1        5      215.62       10.675    5.3303e-08
        Year         2638.4        2      1319.2       65.312    5.5975e-18
        Error          1737       86      20.198                           
        Total        6005.3       93                                       
    
    
      Properties, Methods
    
    
    

    Display an expected mean squares table for the ANOVA.

    [~,ems] = stats(aov)
    ems=3×5 table
                    Type         ExpectedMeanSquares        MeanSquaresDenominator    DFDenominator    FDenominator
                  ________    __________________________    ______________________    _____________    ____________
    
        Origin    "random"    "9.159*V(Origin)+V(Error)"            20.198                  86          MS(Error)  
        Year      "random"    "29.5014*V(Year)+V(Error)"            20.198                  86          MS(Error)  
        Error     "random"    "V(Error)"                                                                           
    
    

    The formulas for the expected mean squares of the random factors Origin and Year contain terms for their respective variance components. You can use the expected mean squares formulas to compare how much of the expected mean squares is due to the variance in the error and how much is due to the variance components of the random terms.

    Input Arguments

    collapse all

    Analysis of variance results, specified as an anova object. The properties of aov contain the factors and response data used by stats to compute the statistics in the ANOVA table.

    Type of ANOVA table, specified as "component" or "summary".

    Example: "summary"

    Data Types: char | string

    Type of the sum of squares used to perform the ANOVA, specified as "three", "two", "one", or "hierarchical". The stats function ignores sstype unless the ANOVA type is "component". For a model containing main effects but no interactions, the value of sstype influences the computations on the unbalanced data only.

    The sum of squares of a term (SSTerm) is defined as the reduction in the sum of squares error (SSE) obtained by adding the term to a model that excludes it. The formula for the sum of squares of a term Term has the form

    SSTerm=i=1n(yifexcl(g1,...,gN))2SSEfexcli=1n(yifincl(g1,...,gN))2SSEfincl

    where n is the number of observations, yi are the response data, g1,...,gN are the factors used to perform the ANOVA, fexcl is a model that excludes Term, and fincl is a model that includes Term. Both fexcl and fincl are specified by SumOfSquaresType. The variables SSEfexcl and SSEfincl are the sum of squares errors for fexcl and fincl, respectively. You can specify fexcl and fincl using one of the options for SumOfSquaresType described in the following table.

    OptionType of Sum of Squares
    "three" (default)

    fincl is the full ANOVA model specified in the property Formula. fexcl is a model composed of all terms in fincl except Term. The model fexcl has the same sigma-restricted coding as fincl. This type of sum of squares is known as Type III.

    "two"

    fexcl is a model composed of all terms in the ANOVA model specified in the property Formula that do not contain Term. If Term is a continuous term, then powers of Term are treated as separate terms that do not contain Term. fincl is a model composed of Term and all the terms in fexcl. This type of sum of squares is known as Type II.

    "one"

    fexcl is a model composed of all the terms that precede Term in the ANOVA model specified in the property Formula. fincl is a model composed of Term and all the terms in fexcl. This type of sum of squares is known as Type I.

    "hierarchical"

    fexcl and fincl are defined as in Type II, except powers of Term are treated as terms that contain Term.

    Example: Component="hierarchical"

    Data Types: char | string

    Output Arguments

    collapse all

    ANOVA statistics, returned as a table.

    The contents of s depend on the ANOVA type specified in type.

    • If type is "component", then s contains ANOVA statistics for each variable in the model except the constant (intercept) term. The table includes these columns for each variable:

      ColumnDescription
      SumOfSquares

      Sum of squares explained by the term and calculated depending on sstype.

      DF

      Degrees of freedom

      • DF of a numeric variable is 1.

      • DF of a categorical variable is the number of dummy variables created for the category (number of categories – 1).

      • DF of an error term is the difference between the DF of the total and the sum of the DF for the model terms.

      • DF of the total is aov.NumObservations–1.

      MeanSquares

      Mean squares, defined by MeanSquares = SumOfSquares/DF.

      MeanSquares for the error term is the mean squared error (MSE).

      F

      F-statistic value to test the null hypothesis that the corresponding coefficient is zero; computed by F = MeanSquares/MSE.

      When the null hypothesis is true, the F-statistic follows the F-distribution.

      pValue

      p-value of the F-statistic value

    • If type is "summary", then s contains summary statistics of grouped terms for each row. The summary statistics are calculated using Type I sum of squares. The table includes the same columns as "component" and these rows:

      RowDescription
      Total

      Total statistics

      • SumOfSquares — Total sum of squares, which is the sum of the squared deviations of the response around its mean

      • DF — Sum of degrees of freedom of Regression and Error

      Regression

      Statistics for the model as a whole

      • SumOfSquares — Model sum of squares, which is the sum of the squared deviations of the fitted value around the response mean.

      • F and pValue — These values provide a test of whether the model as a whole fits significantly better than a degenerate model consisting of only a constant term.

      Linear

      Statistics for linear terms

      • SumOfSquares — Sum of squares for linear terms, which is the difference between the model sum of squares and the sum of squares for nonlinear terms.

      • F and pValue — These values provide a test of whether the model with only linear terms fits better than a degenerate model consisting of only a constant term. stats uses the mean squared error that is based on the full model to compute this F-value, so the F-value obtained by dropping the nonlinear terms and repeating the test is not the same as the value in this row.

      NonLinear

      Statistics for nonlinear terms

      • SumOfSquares — Sum of squares for nonlinear (higher-order or interaction) terms, which is the increase in the residual sum of squares obtained by keeping only the linear terms and dropping all nonlinear terms.

      • F and pValue — These values provide a test of whether the full model fits significantly better than a smaller model consisting of only the linear terms.

      Error

      Statistics for error

      • SumOfSquares — Residual sum of squares, which is the sum of the squared residual values

      • MeanSquares — Mean squared error, used to compute the F-statistic values for Regression, Linear, and NonLinear

      If the data contains replications (multiple observations sharing the same factor values), s also contains rows for LackOfFit and PureError. LackOfFit and PureError break down Error further.

      LackOfFit

      Lack-of-fit statistics

      • SumOfSquares — Sum of squares due to lack of fit, which is the difference between the residual sum of squares and the replication sum of squares.

      • F and pValue — The F-statistic value is the ratio of lack-of-fit MeanSquares to pure error MeanSquares. The ratio provides a test of bias by measuring whether the variation of the residuals is larger than the variation of the replications. A low p-value implies that adding additional terms to the model can improve the fit.

      PureError

      Statistics for pure error

      • SumOfSquares — Replication sum of squares, obtained by finding the sets of points with identical predictor values, computing the sum of squared deviations around the mean within each set, and pooling the computed values

      • MeanSquares — Model-free pure error variance estimate of the response

    Estimated mean squares information, returned as a table. The argument ems contains a row for each term, and a row for the error. The table returned by ems has the following variables.

    • Type — An indicator of whether the term is fixed or random.

    • ExpectedMeanSquares — A formula of the expected mean squares.

    • MeanSquaresDenominator — The value of the denominator in the calculation of the F-statistic.

    • DFDenominator — The value of the degrees of freedom in the calculation of the F-statistic denominator.

    • FDenominator — A formula for the denominator in the calculation of the F-statistic. The denominator changes depending on whether aov.Formula has random interaction terms.

    You can use the ems table to determine if the variance of a random term has a large effect on the estimated mean squares.

    Data Types: table

    References

    [1] Dunn, O. J., and V. A. Clark. Applied Statistics: Analysis of Variance and Regression. New York: Wiley, 1974.

    [2] Goodnight, J. H., and F. M. Speed. Computing Expected Mean Squares. Cary, NC: SAS Institute, 1978.

    [3] Seber, G. A. F., and A. J. Lee. Linear Regression Analysis. 2nd ed. Hoboken, NJ: Wiley-Interscience, 2003.

    Version History

    Introduced in R2022b