Kolmogorov Smirnov test help?

60 次查看(过去 30 天)
arthurk
arthurk 2020-1-20
评论: Adam Danz 2020-5-3
I have the test data below, the kstest(x) function compares the distribution of the data below against a standard normal distribution (mean of 0 and std of 1). Is it better to simply call the function as kstest(x) or correct the data so that its standard deviation and mean is 1 and 0 respectively?
Also when doing so, do you guys get probability as 0.1267 for uncorrected and 0.6506 for corrected?
It's just that I got significantly different values earlier.
Another question is that are the probabilities realistic? When plotting the values on excel the graphs are more or less normally distributed, however they don't pass the significance level of 5%.
Thanks
1.481336
-0.15023
2.253639
-3.44891
-2.06993
-0.54504
3.077467
-0.49623
-0.23977
0.098674
0.237035
-5.38399
1.753639
-1.65023
0.644677
1.407635
0.077467
-0.66607
1.981336
2.644677
-0.12763
4.035716
-1.18049
-1.04504
0.614422
1.345996
1.224973
-3.49454
-4.23659
0.223383
0.907635
0.724973

回答(1 个)

Adam Danz
Adam Danz 2020-1-20
编辑:Adam Danz 2020-1-20
"Is it better to simply call the function as kstest(x) or correct the data so that its standard deviation and mean is 1 and 0 respectively"
The one-sample Kolmogorov-Smirnov test tests the null hypothesis that the data comes from a standard normal distribution (mean 0, std 1). If you correct your data so that it does have a mean of 0 and std of 1, what's the point of testing it?
If you want a more general test that your data come from a normal distribution with any mean or std, use the Anderson-Darling test or the Lilliefors test.
Null hypotheses (from the documentation)
One-sample Kolmogorov-Smirnov test: the data in vector x comes from a standard normal distribution (mean 0, std 1).
Lilliefors test: the data in vector x comes from a distribution in the normal family.
Anderson-Darling test: the data in vector x is from a population with a normal distribution.
If the null hypothesis is rejected (an outcome of 1 for all three tests), the data do not come from those distributions at a 5% significance level.
Note that if there is a failure to reject the null hypothesis (an outcome of 0 for all three tests), that does not indicate that the data do come from those distributions. This is a common misunderstanding of interpretting hypothesis testing.
Here's a domonstration showing the difference between the kstest and the two other ones.
% Create a data from a normal distributions with
% mean 0 and std 1.
x0 = randn(1,10000);
% Use that same exact data to create a normal distribution
% with mean 5 and std 2
x1 = x0*2 + 5;
% Plot both distributions
clf()
histogram(x0)
hold on
histogram(x1)
Notice how this creates two normal distribtions. The blue distribtuion has a mean of 0 and std of 1 while the reddish distribution has a mean of 5 and std of 2 (approximately).
% Look at the results of the ks-tests
ks0 = kstest(x0) % fail to reject
ks1 = kstest(x1) % reject null hyp
% Look at the results of the Lilliefors test
lt0 = lillietest(x0) % fail to reject
lt1 = lillietest(x1) % fail to reject
% Look at the results from the Anderson-Darling test
ad0 = adtest(x0) % fail to reject
ad1 = adtest(x1) % fail to reject
As you can see, the blue distribution is identified as a standard normal distribution and rightfully so since it has a mean of 0 and std of 1 (approximately) while the other distribution does not. However, both distributions are normal as indicated by both the lillietest() and adtest().
  6 个评论
John TS
John TS 2020-5-3
Adam Danz, thanks for the clarification on the null hypothesis and normality tests, esp. in Matlab.
The question now is then what? Suppose I have small samples e.g 10 observatiuons and I have a situation where kstest() rejects that they are normally distributed, but the other two tests lillietest() and adtest() do not reject. Is the data then normally distributed and can be analyzed further with ANOVA etc. which require normality as a prerequisite?
Adam Danz
Adam Danz 2020-5-3
Sounds like the data could come from a normal distribution that isn't a standard normal distribution. Normal distributions are described by a mean and standard deviation (SD). A standard normal distribution is a subject of normal distributions where the mean is 0 and SD is 1.
10 observations aren't much data. When you plot the distributions (using histogram, for example), do the test results make sense?

请先登录,再进行评论。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by