Random triplets not asymmetrically distributed
3 次查看(过去 30 天)
显示 更早的评论
I would like to generate many triplets whose values are drawn from {0.1, 0.2, ... 0.9}. I also want each triplet to sum to 1.0. I use the following code to generate them. However, when I individually histogram the 1st, 2nd, and 3rd element across all triplets, they are nonuniform and asymmetric.
Can anyone explain why the distribution is nonuniform? I can sort of guess that it is due to the constraint that they sum to 1.0.
However, it seems odd that the distribution is skewed. There should be no difference in the upward direction toward 1.0 or the downward direction toward 0.0, as the problem is completely symmetric.
The only possible cause for asymmetry is that rounding is slightly asymmetric, but just by smidgeon.
nRows=1e7; % Number of triplets
w = rand(nRows,3); % Generate row triplets
w = w./sum(w,2); % Normalize each triplet to sum to 1.0
w = round(w,1); % Round to 1 decimal place
w = w( sum(w,2) == 1.0 , : ); % Retain triplets that sum to 1.0
w = w( all(w,2) , : ); % Discard triplets containing zero
% Individually histogram 1st, 2nd, and 3rd element across all
% triplets
for iw = 1:3
subplot(3,1,iw)
histogram( w(:,iw), 0.05:0.1:.95 )
end % for iw
mean(w)
2 个评论
the cyclist
2022-4-12
编辑:the cyclist
2022-4-12
I ran your code in the interface here.
What is puzzling to me is not the skewness of the distributions themselves, but rather the fact that the distribution of the third column is different from the 1st and 2nd columns. Note that I appended a calculation of the mean value of each column, and they are not equal. (It's not just sampling error.)
Maybe I'm overlooking something obvious, but I see no point in the code in which the columns of w are not treated identically from a coding perspective. Superficially, I feel that this algorithm should work.
The line that causes the problem is the one with the sum:
w = w( sum(w,2) == 1.0 , : ); % Retain triplets that sum to 1.0
If you shuffle the values within each row after that point, you will get the expected result:
nRows=1e6; % Number of triplets
w = rand(nRows,3); % Generate row triplets
w = w./sum(w,2); % Normalize each triplet to sum to 1.0
w = round(w,1); % Round to 1 decimal place
w = w( sum(w,2) == 1.0 , : ); % Retain triplets that sum to 1.0
for nr = 1:size(w,1)
w(nr,:) = w(nr,randsample(3,3));
end
w = w( all(w,2) , : ); % Discard triplets containing zero
% Individually histogram 1st, 2nd, and 3rd element across all
% triplets
figure
for iw = 1:3
subplot(3,1,iw)
histogram( w(:,iw), -0.05:0.1:1.05 )
end % for iw
mean(w)
I speculate that MATLAB is doing something clever when evaluating
sum(w,2) == 1.0
that leads to w(:,3) being the "odd man out".
采纳的回答
Walter Roberson
2022-4-12
Please look in the File Exchange to get randfixsum() by Roger Stafford https://www.mathworks.com/matlabcentral/fileexchange/9700-random-vectors-with-fixed-sum
4 个评论
Bruno Luong
2022-4-12
This answer is not suitable for discrete case, which is entirely different beast if uniformity is required.
更多回答(1 个)
John D'Errico
2022-4-12
编辑:John D'Errico
2022-4-12
If you wish to sample from the set of triplets that sum to exactly 1, where each elemnt is discrete, then you do NOT want to use randfized sum. In fact, you don't want to do it by rejection either!!!!!!!!
The set of triples that sum to 1 is trivially generated. I will just generate the entire set by brute force. Simplest is like this (Note that I am using integers initially. At the end, I'll divide by 10.) Next, you should see that the MAXIMUM element must ALWAYS be 0.8, NOT 0.9. There is no way to add THREE numbers from the set (1:9)/10, and have one of them be 0.9, with the sum as 1. That should be obvious.
V = 1:8;
[V1,V2,V3] = ndgrid(V,V,V);
V123 = [V1(:),V2(:),V3(:)];
V123(sum(V123,2) ~= 10,:) = [];
size(V123,1)
So there are EXACTLY 36 possible ways to form that sum. No more, no less.
V123
At the end, divide by 10.
V123 = V123/10;
Those are the ONLY ways to form the sum you want.
Now if you want to generate random triples, just generate a random integer from 1 through 36. Use that to sample from the rows of V123.
Nsets = 10000;
ind = randi([1,36],[Nsets,1]);
triples = V123(ind,:);
histogram(triples(:,1),100)
histogram(triples(:,2),100)
histogram(triples(:,3),100)
Now, I suppose that you MIGHT decide this does not make sense, that we should expect to have equally as many samples with .1 as we have .6, .7, or .8. But of course that is silly. In fact, we should expect a non-uniform distribution as we see, for each channel. Think about it.
For example, how many ways are there to get a sum that includes one element that is 0.8? EXACTLY 3 ways, that is... {[0.1 0.1 0.8], [0.1 0.8 0.1], [0.8 0.1 0.1]}.
sum(V123 == 0.8,1)
But now, how many ways are there for the sum to have 0.1 in it?
sum(V123 == 0.1,1)
So there are 8 times as many sums that include the number 0.1 in each member of the triple, than there are ways to find the number 0.8.
3 个评论
John D'Errico
2022-4-12
(Um, you can change which answer you have accepted. But i really don't care that much about being accepted. Reputation is worth nothing to me. When I die, I hope my obituary has something better to say about me than quote my reputation on a web site.)
People misunderstand what uniform would mean in a case like this.
The constraint that the numbers sum to 1 implies all sets of points MUST lie in a plane. And since they are all positive, the constraint region becomes a triangle.
V = 1:8;
[V1,V2,V3] = ndgrid(V,V,V);
V123 = [V1(:),V2(:),V3(:)];
V123(sum(V123,2) ~= 10,:) = [];
plot3(V123(:,1),V123(:,2),V123(:,3),'o')
axis equal
box on
grid on
Do you see that all points lie in a planar triangle? How about if I rotate the plot?
plot3(V123(:,1),V123(:,2),V123(:,3),'o')
axis equal
box on
grid on
view(-24,53)
The points all lie in a triangle. They are uniformly scattered in the triangle. But that means if you look at marginals, so looking at one variable only, the histogram you see will NOT be uniform.
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!