Equally distributed multidimensional random values with boundaries - how to generate?

Question

Karol P. 2023-2-13

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1911895-equally-distributed-multidimensional-random-values-with-boundaries-how-to-generate

评论： William Rose 2023-2-14

I have to generate a matrix that will have 100 columns. Every row represents a value that can change in defined range. For example, I can describe it by the array:

A=[1 5; 3 7; 1 10]

Where 1 to 5 is first row range, 3 to 7 is the second, and 1 to 10 is the last. If I want to generate the random distribution to cover the range, just for one line, I can do this as follow:

data = lb + rand(1,100) .* ( ub - lb );

Where ub and lb are upper and lower boundary. Now, I can reproduce this in simple for loop:

for i=1:size(A,1)
    lb=A(i,1);
    ub=A(i,2);
    data2(i,:) = lb + rand(1,100) .* ( ub - lb );
end

But in this case, every single row is evaluated separately, So I don't have any guarantee that the distribution will be equal in the meaning of comibinations between rows, as every rows changes independent. For example I can encounter situation where I will not have any combination with Row 1 close to 1 and Row 2 close to 7, just because of RNG. Is there any way I can sovle my problem and ensure multidimensional equal random distribution?

7 个评论
显示 5更早的评论隐藏 5更早的评论

John D'Errico 2023-2-13

在 MATLAB Online 中打开

A uniformly distributed independent random sample does not require that two successive values have any relation to each other. I'll use rand to make an example.

rand()
ans = 0.9470

So the first point I got lies above 0.5. Now, I'll sample a second point. There is NO reaon to expect that the next point will lie at some value less than 1/2, even if the first sample I generated was greater than 1/2.

rand()
ans = 0.5811

Do you understand that? Likewise, flipping a fair coin twice in a row does not mean that if the first toss was a head, then the second toss MUST be a tail, or even that a tail is any more likely on the second coin toss.

But that is exactly what you are asking to have happen. You seem to think a uniformly distributed set of numbers has some sort of memory, so that future samples will in some way depend on the previous samples. I'm sorry, but that is not how independent random variables work. And each successive random sample (from rand) is independent from the previous ones, as much as is possible in the context of how a pseudo-random variable can be. And rand was designed to use a very good pseudo-random variable scheme, with very good statistical properties.

Yes, the laws of probability and statistics do apply. Over a long term, the sample mean of a distribution will tend to the population mean. So that eventually things will balance out.

John D'Errico 2023-2-13

编辑：John D'Errico 2023-2-13

在 MATLAB Online 中打开

@Karol P.

I'm sorry, but I think you still misunderstand random numbers, what a uniform distribution means, and, apparently the entire point of my comment.

That you have columns with different ranges is completely irrelevant. Each column will be filled with sets of numbers that are uniformly distributed. And they are independent of other columns, or of previous samples.

For example:

n = 25000;
X = [rand(n,1),rand(n,1)*2 + 1];

So the first column of X (thus X(:,1)) is uniformly distributed, on the open interval (0,1).

X(:,2) is niformly distributed on the open interval (1,3).

These points, if taken as points in the two dimensional box (0,1)x(1,3), will fill that space uniformly. Of course if the sampling is coarse enough, the box will be filled in very well.

plot(X(:,1),X(:,2),'.')

If I choose n a bit larger, then the figure turns completely blue, with white showing through at all. And if you count the number of points in any local region of the box, so essentially a 2-dimensional histogram, then you would find that locally the number of points in that region will be proportional to the area of the region you looked at.

For example, histcounts produces that 2-d histogram.

[N,XEDGES,YEDGES] = histcounts2(X(:,1),X(:,2))
N = 10×10
   225   253   249   264   235   204   250   255   267   265
   250   287   247   221   251   244   277   244   228   256
   279   251   236   249   224   258   262   267   263   296
   257   243   227   230   266   228   273   232   223   250
   267   274   248   252   236   241   268   236   261   255
   257   263   264   242   258   255   248   270   220   242
   239   272   243   249   253   227   222   248   244   265
   240   262   277   253   270   252   259   239   239   262
   255   264   257   235   289   258   200   235   231   264
   238   245   232   252   251   234   233   254   254   261
XEDGES = 1×11
         0    0.1000    0.2000    0.3000    0.4000    0.5000    0.6000    0.7000    0.8000    0.9000    1.0000
YEDGES = 1×11
    1.0000    1.2000    1.4000    1.6000    1.8000    2.0000    2.2000    2.4000    2.6000    2.8000    3.0000

And we would expect to see on average, with a 10x10 grid of bins on that domain, we would expect to see 1% of the samples falling in each bin. Indeed, that is what happens. If the sample size were larger, then the counts in each bin will more accurately approach that value of 1% in each bin. We expect to see some degree of variability of course in those bin counts, but as I have said, that will decrease with sample size.

surf(N)

That the different sets of variables live in different intervals is completely irrelevant. (Sorry, I forgot to scale the x and y axes in the 2-d hstogram plot.)

Karol P. 2023-2-13

So do I understand correctly, that, as long as the sample size is high enough, the independed calculation of every row will not lead to any unequalities in distribution? I mean the case where, for example, I will have statistically important surplus of columns where the value of first row will be close to lb while in the second it will be close to up? It is pure RNG so I expected that without further limitations this case is at least possible.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

William Rose 2023-2-13

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1911895-equally-distributed-multidimensional-random-values-with-boundaries-how-to-generate#answer_1170770

@Karol, my new understanding is that you want to find a uniformly distributed random point in a 3D rectangle. The bounds of the rectangle are chosen at random from a discrete set of possibilities. A is 3x2. Column 1 of A has the 3 allowed lower bounds for the edges. Column 2 has the 3 allowed upper values. Am I understanding you correctly? If so you will need two discrete random choices (one each for lower and upper bounds) followed by a 3d uniform random choice.

8 个评论
显示 6更早的评论隐藏 6更早的评论

Karol P. 2023-2-14

编辑：Karol P. 2023-2-14

OK, thank you once again. I think we can cosider the question answered.

William Rose 2023-2-14

OK, you're welcome, @Karol P. Good luck with your work.

请先登录，再进行评论。

Equally distributed multidimensional random values with boundaries - how to generate?

7 个评论
显示 5更早的评论隐藏 5更早的评论

采纳的回答

8 个评论
显示 6更早的评论隐藏 6更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Equally distributed multidimensional random values with boundaries - how to generate?

7 个评论 显示 5更早的评论隐藏 5更早的评论

采纳的回答

8 个评论 显示 6更早的评论隐藏 6更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

7 个评论
显示 5更早的评论隐藏 5更早的评论

8 个评论
显示 6更早的评论隐藏 6更早的评论