Nonparametric Methods
Introduction to Nonparametric Methods
Statistics and Machine Learning Toolbox™ functions include nonparametric versions of one-way and two-way analysis of variance. Unlike classical tests, nonparametric tests make only mild assumptions about the data, and are appropriate when the distribution of the data is non-normal. On the other hand, they are less powerful than classical methods for normally distributed data.
Both of the nonparametric functions described here will return
a stats
structure that can be used as an input
to the multcompare
function for
multiple comparisons.
Kruskal-Wallis Test
The example Perform One-Way ANOVA uses one-way analysis of variance to determine if the bacteria counts of milk varied from shipment to shipment. The one-way analysis rests on the assumption that the measurements are independent, and that each has a normal distribution with a common variance and with a mean that was constant in each column. You can conclude that the column means were not all the same. The following example repeats that analysis using a nonparametric procedure.
The Kruskal-Wallis test is a nonparametric version of one-way analysis of variance. The assumption behind this test is that the measurements come from a continuous distribution, but not necessarily a normal distribution. The test is based on an analysis of variance using the ranks of the data values, not the data values themselves. Output includes a table similar to an ANOVA table, and a box plot.
You can run this test as follows:
load hogg p = kruskalwallis(hogg) p = 0.0020
The low p value means the Kruskal-Wallis test results agree with the one-way analysis of variance results.
Friedman's Test
Perform Two-Way ANOVA uses two-way analysis of variance to study the effect of car model and factory on car mileage. The example tests whether either of these factors has a significant effect on mileage, and whether there is an interaction between these factors. The conclusion of the example is there is no interaction, but that each individual factor has a significant effect. The next example examines whether a nonparametric analysis leads to the same conclusion.
Friedman's test is a nonparametric test for data having a two-way layout (data grouped by two categorical factors). Unlike two-way analysis of variance, Friedman's test does not treat the two factors symmetrically and it does not test for an interaction between them. Instead, it is a test for whether the columns are different after adjusting for possible row differences. The test is based on an analysis of variance using the ranks of the data across categories of the row factor. Output includes a table similar to an ANOVA table.
You can run Friedman's test as follows.
load mileage p = friedman(mileage,3) p = 7.4659e-004
Recall the classical analysis of variance gave a p value to test column effects, row effects, and interaction effects. This p value is for column effects. Using either this p value or the p value from ANOVA (p < 0.0001), you conclude that there are significant column effects.
In order to test for row effects, you need to rearrange the
data to swap the roles of the rows in columns. For a data matrix x
with
no replications, you could simply transpose the data and type
p = friedman(x')
With replicated data it is slightly more complicated. A simple way is to transform the matrix into a three-dimensional array with the first dimension representing the replicates, swapping the other two dimensions, and restoring the two-dimensional shape.
x = reshape(mileage, [3 2 3]); x = permute(x,[1 3 2]); x = reshape(x,[9 2]) x = 33.3000 32.6000 33.4000 32.5000 32.9000 33.0000 34.5000 33.4000 34.8000 33.7000 33.8000 33.9000 37.4000 36.6000 36.8000 37.0000 37.6000 36.7000 friedman(x,3) ans = 0.0082
Again, the conclusion is similar to that of the classical analysis of variance. Both this p value and the one from ANOVA (p = 0.0039) lead you to conclude that there are significant row effects.
You cannot use Friedman's test to test for interactions between the row and column factors.