randperm non uniformly distributed
3 次查看(过去 30 天)
显示 更早的评论
I want to sample from integers 1 through 56 without replacement. Neither randperm nor datasample with 'Replacement',false give a uniformly distributed set if I iterate many times. Why is the last bin in the histogram double the size of the the rest?
perms=zeros(10000,6);
samps=zeros(10000,6);
[rp, cp]=size(perms);
for p=1:rp
permstemp = randperm(56,6);
perms(p,:)=permstemp;
end
[rs, cs]=size(samps);
for s=1:rs
sampstemp = datasample(1:56,6,'Replace',false);
samps(s,:)=sampstemp;
end
histogram(perms(1:end))
histogram(samps(1:end))
0 个评论
采纳的回答
John D'Errico
2019-8-15
Sigh. This is NOT a question of non-uniformity. Just a question of not understanding how to recognize non-uniformity, and partially how to understand a histogram.
If you create a histogram with too few bins, what happens is there will be SOME bins that have multiple counts in those bins.
It turns out that histogram decided to use bin edges of 1:56 here, so the last bin got used for twice as many samples.
Note the difference between these two calls to histogram:
histogram(perms(1:end))
histogram(perms(1:end),1:56)
histogram(perms(1:end),1:57)
The first two produce the same results. So it appears the default for the bin edges was 1:56. However, when I gave it another bin up to 57, all things appear normal.
So what happens when I have bin edges 1:56? There are integer events at 56, and some at 55. So that last bin had all events that were either 55 OR 56 in the bin. Whereas bin number 1 only had the events that were strictly a 1. When I get it one more bin to use for the histogram, things were now fine.
So before you claim non-uniformity, think about whether the test you are using that asserts non-uniformity might be flawed.
3 个评论
Steven Lord
2019-8-15
John is correct. As stated in the histogram documentation page, "Each bin includes the left edge, but does not include the right edge, except for the last bin which includes both edges."
Before John added that last bin edge at 57, the last bin was [55, 56] and the next-to-last bin was [54, 55). So the last bin counted two distinct values from the data.
After John added that last bin edge at 57, the last bin is [56, 57] and the next-to-last bin is [55, 56). Each of the last two bins now counts only one distinct value from the data.
更多回答(1 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Distribution Plots 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!