statistics reported by ranksum are wrong

2 次查看(过去 30 天)
This is less a question and more of a bug report.
The ranksum U statistic reported by the ranksum function is much too large. Here's a simple example:
a1 = 1 : 100;
a2 = a1 + 0.01;
[ p, h, stats ] = ranksum( a2, a1 )
p =
0.9037
h =
0
stats =
zval: 0.1209
ranksum: 10100
The correct ranksum, working from the formal definition of Wilcoxon ranksum, is 5050. I have verified this with an online calculator for the U statistic.
After some experimentation, I believe the value being reported for U is actually U + ( n1 * n2 ) / 2, where n1 and n2 are the number of instances in the two samples.
The reported p and h values agree reasonably well with what I get from other calculators.

回答(1 个)

the cyclist
the cyclist 2013-4-16
Jeff,
Here is an excerpt from the notes to the equivalent function in R:
"The literature is not unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney tests. The two most common definitions correspond to the sum of the ranks of the first sample with the minimum value subtracted or not: R subtracts and S-PLUS does not, giving a value which is larger by m(m+1)/2 for a first sample of size m. (It seems Wilcoxon's original paper used the unadjusted sum of the ranks but subsequent tables subtracted the minimum.)"
It seems you are seeing this lack of convention.

类别

Help CenterFile Exchange 中查找有关 Startup and Shutdown 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by