stats
Description
Examples
Display Summary Table for Two-Way ANOVA
Load popcorn yield data.
load popcorn.mat
The columns of the 6-by-3 matrix popcorn
contain popcorn yield observations in cups for the brands Gourmet, National, and Generic. The first three rows of popcorn
correspond to popcorn that was popped using an air popper and the last three rows correspond to popcorn popped in oil.
Create string arrays of factor values for the brand and type of popper using the repmat
function.
brand = [repmat("Gourmet",6,1); repmat("National",6,1); repmat("Generic",6,1)]; popperType = repmat(["Air";"Air";"Air";"Oil";"Oil";"Oil"], [3, 1]); factors = {brand,popperType};
Perform a two-way ANOVA to test the null hypothesis that the mean popcorn yield is not affected by the brand of popcorn and popper type.
aov = anova(factors,popcorn(:),FactorNames=["Brand","PopperType"],ModelSpecification="interactions")
aov = 2-way anova, constrained (Type III) sums of squares. Y ~ 1 + Brand*PopperType SumOfSquares DF MeanSquares F pValue ____________ __ ___________ ____ __________ Brand 15.75 2 7.875 56.7 7.679e-07 PopperType 4.5 1 4.5 32.4 0.00010037 Brand:PopperType 0.083333 2 0.041667 0.3 0.74622 Error 1.6667 12 0.13889 Total 22 17 Properties, Methods
By default, anova
displays a component ANOVA table.
Generate a summary ANOVA table.
s = stats(aov,"summary")
s=5×5 table
SumOfSquares DF MeanSquares F pValue
____________ __ ___________ _____ __________
Linear 20.25 3 6.75 48.6 5.4835e-07
NonLinear 0.083333 2 0.041667 0.3 0.74622
Regression 20.333 5 4.0667 29.28 2.5065e-06
Error 1.6667 12 0.13889
Total 22 17 1.2941
The row Linear
corresponds to the terms Brand
and PopperType
in the ANOVA model. The small p-value in the Linear
row indicates that Brand
and PopperType
have a statistically significant combined effect on the popcorn yield. The row NonLinear
corresponds to the term Brand:PopperType
. The large p-value in the NonLinear
row indicates that the interaction term does not have a statistically significant effect on the popcorn yield. The small p-value in the row Regression
indicates that the ANOVA model is a better predictor of the response data than the mean of the data.
Display Expected Mean Squares Table for Two-Way ANOVA
Load the sample car data.
load carsmall
Data for the country of origin, model year, and mileage is stored in the variables Origin
, Model_Year
, and MPG
, respectively.
Perform a two-way ANOVA to test the null hypothesis that mean mileage is not affected by the country of origin or model year.
aov = anova({Origin, Model_Year},MPG,RandomFactors=[1 2],FactorNames=["Origin" "Year"])
aov = 2-way anova, constrained (Type III) sums of squares. Y ~ 1 + Origin + Year SumOfSquares DF MeanSquares F pValue ____________ __ ___________ ______ __________ Origin 1078.1 5 215.62 10.675 5.3303e-08 Year 2638.4 2 1319.2 65.312 5.5975e-18 Error 1737 86 20.198 Total 6005.3 93 Properties, Methods
Display an expected mean squares table for the ANOVA.
[~,ems] = stats(aov)
ems=3×5 table
Type ExpectedMeanSquares MeanSquaresDenominator DFDenominator FDenominator
________ __________________________ ______________________ _____________ ____________
Origin "random" "9.159*V(Origin)+V(Error)" 20.198 86 MS(Error)
Year "random" "29.5014*V(Year)+V(Error)" 20.198 86 MS(Error)
Error "random" "V(Error)"
The formulas for the expected mean squares of the random factors Origin
and Year
contain terms for their respective variance components. You can use the expected mean squares formulas to compare how much of the expected mean squares is due to the variance in the error and how much is due to the variance components of the random terms.
Input Arguments
aov
— Analysis of variance results
anova
object
Analysis of variance results, specified as an anova
object.
The properties of aov
contain the factors and response data used by
stats
to compute the statistics in the ANOVA table.
type
— Type of ANOVA table
"component"
(default) | "summary"
Type of ANOVA table, specified as "component"
or
"summary"
.
Example: "summary"
Data Types: char
| string
sstype
— Type of sum of squares
"three"
(default) | "two"
| "one"
| "hierarchical"
Type of the sum of squares used to perform the ANOVA, specified as
"three"
, "two"
, "one"
, or
"hierarchical"
. The stats
function
ignores sstype
unless the ANOVA type is
"component"
. For a model containing main effects but no
interactions, the value of sstype
influences the computations on
the unbalanced data only.
The sum of squares of a term () is defined as the reduction in the sum of squares error (SSE) obtained by adding the term to a model that excludes it. The formula for the sum of squares of a term Term has the form
where n is the number of observations, are the response data, are the factors used to perform the ANOVA, is a model that excludes Term, and is a model that includes Term. Both and are specified by SumOfSquaresType
. The variables and are the sum of squares errors for and , respectively. You can specify and using one of the options for SumOfSquaresType
described
in the following table.
Option | Type of Sum of Squares |
---|---|
"three" (default) | is the full ANOVA model specified in the property
|
"two" | is a model composed of all terms in the ANOVA model
specified in the property |
"one" | is a model composed of all the terms that precede
Term in the ANOVA model specified in the property
|
"hierarchical" | and are defined as in Type II, except powers of Term are treated as terms that contain Term. |
Example: Component="hierarchical"
Data Types: char
| string
Output Arguments
s
— ANOVA statistics
table
ANOVA statistics, returned as a table.
The contents of s
depend on the ANOVA type specified in
type
.
If
type
is"component"
, thens
contains ANOVA statistics for each variable in the model except the constant (intercept) term. The table includes these columns for each variable:Column Description SumOfSquares
Sum of squares explained by the term and calculated depending on
sstype
.DF
Degrees of freedom
DF
of a numeric variable is 1.DF
of a categorical variable is the number of dummy variables created for the category (number of categories – 1).DF
of an error term is the difference between theDF
of the total and the sum of theDF
for the model terms.DF
of the total isaov.NumObservations
–1.
MeanSquares
Mean squares, defined by
MeanSquares
=SumOfSquares
/DF
.MeanSquares
for the error term is the mean squared error (MSE).F
F-statistic value to test the null hypothesis that the corresponding coefficient is zero; computed by
F
=MeanSquares
/MSE
.When the null hypothesis is true, the F-statistic follows the F-distribution.
pValue
p-value of the F-statistic value
If
type
is"summary"
, thens
contains summary statistics of grouped terms for each row. The summary statistics are calculated using Type I sum of squares. The table includes the same columns as"component"
and these rows:Row Description Total
Total statistics
SumOfSquares
— Total sum of squares, which is the sum of the squared deviations of the response around its meanDF
— Sum of degrees of freedom ofRegression
andError
Regression
Statistics for the model as a whole
SumOfSquares
— Model sum of squares, which is the sum of the squared deviations of the fitted value around the response mean.F
andpValue
— These values provide a test of whether the model as a whole fits significantly better than a degenerate model consisting of only a constant term.
Linear
Statistics for linear terms
SumOfSquares
— Sum of squares for linear terms, which is the difference between the model sum of squares and the sum of squares for nonlinear terms.F
andpValue
— These values provide a test of whether the model with only linear terms fits better than a degenerate model consisting of only a constant term.stats
uses the mean squared error that is based on the full model to compute this F-value, so the F-value obtained by dropping the nonlinear terms and repeating the test is not the same as the value in this row.
NonLinear
Statistics for nonlinear terms
SumOfSquares
— Sum of squares for nonlinear (higher-order or interaction) terms, which is the increase in the residual sum of squares obtained by keeping only the linear terms and dropping all nonlinear terms.F
andpValue
— These values provide a test of whether the full model fits significantly better than a smaller model consisting of only the linear terms.
Error
Statistics for error
SumOfSquares
— Residual sum of squares, which is the sum of the squared residual valuesMeanSquares
— Mean squared error, used to compute the F-statistic values forRegression
,Linear
, andNonLinear
If the data contains replications (multiple observations sharing the same factor values),
s
also contains rows forLackOfFit
andPureError
.LackOfFit
andPureError
break downError
further.LackOfFit
Lack-of-fit statistics
SumOfSquares
— Sum of squares due to lack of fit, which is the difference between the residual sum of squares and the replication sum of squares.F
andpValue
— The F-statistic value is the ratio of lack-of-fitMeanSquares
to pure errorMeanSquares
. The ratio provides a test of bias by measuring whether the variation of the residuals is larger than the variation of the replications. A low p-value implies that adding additional terms to the model can improve the fit.
PureError
Statistics for pure error
SumOfSquares
— Replication sum of squares, obtained by finding the sets of points with identical predictor values, computing the sum of squared deviations around the mean within each set, and pooling the computed valuesMeanSquares
— Model-free pure error variance estimate of the response
ems
— Estimated mean squares information
table
Estimated mean squares information, returned as a table. The argument
ems
contains a row for each term, and a row for the error. The
table returned by ems
has the following variables.
Type
— An indicator of whether the term is fixed or random.ExpectedMeanSquares
— A formula of the expected mean squares.MeanSquaresDenominator
— The value of the denominator in the calculation of the F-statistic.DFDenominator
— The value of the degrees of freedom in the calculation of the F-statistic denominator.FDenominator
— A formula for the denominator in the calculation of the F-statistic. The denominator changes depending on whetheraov.Formula
has random interaction terms.
You can use the ems
table to determine if the
variance of a random term has a large effect on the estimated mean squares.
Data Types: table
References
[1] Dunn, O. J., and V. A. Clark. Applied Statistics: Analysis of Variance and Regression. New York: Wiley, 1974.
[2] Goodnight, J. H., and F. M. Speed. Computing Expected Mean Squares. Cary, NC: SAS Institute, 1978.
[3] Seber, G. A. F., and A. J. Lee. Linear Regression Analysis. 2nd ed. Hoboken, NJ: Wiley-Interscience, 2003.
Version History
Introduced in R2022b
See Also
anova
| varianceComponent
| N-Way ANOVA | One-Way ANOVA | Two-Way ANOVA
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)