Generating dispersed (non-integer) random matrix/array that sums to a particular value

Question

J AI 2020-6-28

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/555907-generating-dispersed-non-integer-random-matrix-array-that-sums-to-a-particular-value

编辑： J AI 2020-6-28

One of the most suggested (in fact the only one to my finding) for generating random numbers (<1) that will sum to 1 is Random Vectors with Fixed Sum by Roger Stafford. However, what I noticed is that the data generated is not well dispersed. e.g.,

P = randfixedsum(10,10000,1,0.05,0.9); % a 10-by-100000 matrix where each column of P sums to 1 and each elements is between 0.05 and 0.9
find(any(P>0.5))
ans =
  1×0 empty double row vector

So far, every single time I tried it results in an empty vector - it always limits itself within below 0.5. Is there a way I could generate more dispersed data where it would include values between 0.05 and 0.9 (for the above example)?

Thanks in advance for your kind help.

FYI: I have tried this (took help from one of the MATLAB answers)

function P = rand_fixed_sum_2(p,n) % p number of columns, and n number of rows and each column sums to 1
    for j = 1:p
            n1=10^(n-1);
            m=1:n1;
            a=m(sort(randperm(n1,n)));
            b=diff(a);
            b(end+1)=n1-sum(b);
            P(:,j) = (b/sum(b))';
    end
    
end

But obviously the value of n1 is not feasible for higher dimensions (n>5). However, for lower dimensions, by tweaking n1, I could get much more dispersed data.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

John D'Errico 2020-6-28

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/555907-generating-dispersed-non-integer-random-matrix-array-that-sums-to-a-particular-value#answer_458233

编辑：John D'Errico 2020-6-28

在 MATLAB Online 中打开

I think you do not understand what you are asking.

randfixedssum indeed produces results that are uniformly sistributed within the sub-set in question. That is, any point in a 10 dimensional space that satisfies the requirements of a fixed sum is equally likely to arise.

However, that does not mean that it is at all probable you would find something that satisfies your goal, of "dispersion".

For example, suppose you were to choose one element that is greater than 0.5? Then the probability that the other 9 elements were ALL small enough that the sum is 1, is pretty low. In the 9 dimensional space that remains, that event would be actually very uncommon.

Thus, you want to generate 10 numbers, all of which lie between 0.05 and 0.9, such that the sum is 1.

Suppose, just suppose that one of the numbers was say, 0.6? Now what are the odds that you can find 9 other numbers that make the total sum exactly 1, but none of them are less than 0.05? SURPRISE! It can never be done.

In fact, if any simgle element was any larger than 0.55 in this example, your goal will never be doable. So if one element is as large as even 0.55+eps, it is mathematically impossible to find 9 numbers, all of which are between 0.05 and 0.9, such that the sum is 0.45-eps.

Next, suppose one element was even as large as 0.5? Just one element that large?

Now the other 9 elements must all be very close to 0.05. What is the probability of that event? Not surprisingly, it is pretty darn small. I can compute the actual probability of such an event to happen if you need. Being too lazy to think at this time of day...

X = randfixedsum(10,10000000,1,0,0.9);
sum(max(X) >= 0.5)
ans =
      195844

So 1.96e5 such events in 1e7. A little under 2% of the time. As expected, a rare event, and that is EXACTLY as it should be.

You ask for dispersion. But you don't seem to understand what dispersion means or what it implies in this context.

If I look at the distribution of the maximum of all 10 elements, I get something that is actually pretty reasonable.

X = randfixedsum(10,10000,1,0.05,0.9);
   Min     0.1207
0%     0.1342
0%     0.1445
0%     0.1524
0%     0.1674
0%     0.1884
0%     0.2167
0%     0.2503
0%     0.2738
0%     0.3143
   Max     0.4039

Most of the time, we get a maximum value that is pretty small in context. And that is because the sample truly is uniformly distributed around the constraint space. One point in that space is equally as likely to arise as any other point. But that does NOT mean that the maximum is ever likely to be larger than 0.55. In fact, that would be an impossible event.

Suppose instead, that we change the way things were generated? Now, instead of requiring that the min be 0.05. Just make it 0. How do the statistics change?

X = randfixedsum(10,10000,1,0,0.9);
   Min     0.1395
0%     0.1681
0%     0.1902
0%      0.205
0%     0.2353
0%     0.2784
0%     0.3359
0%      0.401
0%     0.4492
0%     0.5479
   Max     0.8123

As you now see, the maximum element is now considerably larger. In the same size sample, I once got something as large as 0.8123. There is now much more room for those "dispersed" events to arise.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

J AI 2020-6-28

编辑：J AI 2020-6-28

Oh wow. really appreciate your detailed painstaking explanation. I can see how I got the whole thing messed up with my requirements. Thank you so much for clearing it up with such clarity.

请先登录，再进行评论。

Generating dispersed (non-integer) random matrix/array that sums to a particular value

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Generating dispersed (non-integer) random matrix/array that sums to a particular value

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论