quantify difference between discrete distributions

Hello,
I am trying to quantify the difference between two discrete distributions. I have been reading online and there seems to be a few different ways such as a Kolmogorov-Smirnov test and a chi squared test.
My first question is which of these is the correct method for comparing the distributions below?
The distributions are discrete distributions with 24 bins.
My second question is that, it pretty obvious looking at the distributions that they will be statistically significantly different, but is there a method to quantify how different they are? I'm not sure, but a percentage or distance perhaps?
I appreciate any help and comments
Kind Regards

 采纳的回答

Use the Two-sample Kolmogorov-Smirnov test from the Statistics Toolbox.

8 个评论

doc kstest2
A measure of how different they are will be the p-value. Note that you would need the sample data, not the histograms you mentioned here.
Thanks, I meant to say above that the raw data itself is discrete. "Binning" the data as I described is probably incorrect, the data itself is discrete. Is the Kolmogorov-Smirnov test still suitable?
Thanks
Hello Jose,
Sorry for bothering you again, but can I ask you one further question regarding kstest2.
I performed the test using the raw data behind the histograms above.
This is the output that I got
[h,p,k] = kstest2(a,b)
h =
1
p =
4.9903e-113
k =
0.2948
Because h=1 ,the test rejects the null hypothesis at the 5% significance level, so they're not from he same distribution.
I'm just wondering, how would you interpret p?
Thank you for you help
Kind Regards
The null hypothesis (the two sample come from the same distribution) could not be rejected if p >= k. In this case, p is much smaller than k, meaning that the null hypothesis can be reject. The difference between p and k is an indicator of how far off your results are. In any case, I would recommend reading any basic statistics book, or go to wikipedia for more details on p-values.
In the example in the documentation, doc kstest2, p is less than k and the hypothesis is not rejected. You say "could not be rejected if p >= k"
In your previous post you said "A measure of how different they are will be the p-value", but in my case P is very small??
The smaller p , the larger the difference between distributions. It's consistent with what I said before, methinks. It is a comparative measure, and you need to know what you are comparing against (in this case k). How to define k is a different story, for that you need to read the paper of those who came up with the statistic.
Also, you should be careful how you phrase the results from an hypothesis test. In this case, I should have said: "if _p>=k then the hypothesis that both samples come from the same distribution cannot be rejected".

请先登录,再进行评论。

更多回答(0 个)

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by