how to generate random numbers with constraints?
39 次查看(过去 30 天)
显示 更早的评论
Hi, I want to generate 24 random numbers whose sum is limited between [3550 3650]. in addition each number must be between [5 800].
here is my try:
ub=800*ones(1,24);
lb=5*ones(1,24);
a=lb+rand(1,24).*(ub-lb);
while sum(a)>3650 || sum(a)<3550
a=lb+rand(1,24).*(ub-lb);
end
but it stucks in the loop forever. please help me...
6 个评论
John D'Errico
2019-4-29
The data that I generated IS uniform. It is uniform over the domain of the triangular region where the data lives. It is uniform over the set that satifies the sum constraint.
The marginal distribution has a triangular PDF. It is NOT a gamma. There is no way that you can have data that is BOTH uniform over the constraint domain, as well as being uniform in any of the individual variables.
If you are looking for data that is truly uniform in each of the variables, yet has a sum that falls within some bounds? You can't have that.
Consider the example I gave. I chose two variables, x and y, uniform over the interval [0,1]. Now, look at the sum x+y. Can the variables be truly independently uniform over the domain, yet have a sum that is less than 1? So I'm not sure what you are asking to produce anymore. What is the final goal here?
采纳的回答
John D'Errico
2019-4-29
编辑:John D'Errico
2019-4-29
This is not simple, if you want to find a truly uniform set. In 24 dimensions, things get fairly complicated. And the domain the numbers can live in is actually pretty wide, thus the interval [5,800].
So, if you have all numbers at or near their mininum,
5*24
ans =
120
>> 800*24
ans =
19200
>> (5 + 800)/2*24
ans =
9660
The point being that it will be a quite rare event that 24 numbers, chosen randomly from that interval, will have a sum in the interval [3550,3650].
The law of large numbers tells us that the sum of that set will be effectively normally distributed. And, in fact, this gives us a solution to the problem!
Lets see. Each member of that set will be uniform, on the interval [5,800]. That means the mean of x_i will be
a = 5;
b = 800;
(a + b)/2
ans =
402.5
And the variance (of each number in the set) will be
(b-a)^2/12
ans =
52668.75
So now the sum of 24 of them will be fairly accurately normally distributed, with mean, variance, and standard deviation:
sumMean = (a+b)/2*24
xmean =
9660
sumVar = (b-a)^2/12*24
xvar =
1264050
sumStd = sqrt(sumVar)
xstd =
1124.29978208661
So, now what are the odds that you will find a sum in the range from [3550,3650]? That would be found from a cumulative normal.
sumLimits = [3550,3650];
normcdf(sumLimits,sumMean,sumStd)
ans =
2.74761306010529e-08 4.50716095152589e-08
diff(ans)
ans =
1.7595478914206e-08
Hmm. So you will probably need to generate roughly 1e8 sets of 24 uniform numbers before you find a single set that lives in the interval [3550,3650].
But! As I said, this actually gives you a near trivial solution! How? We break the problem down into two simpler problems. Hard problems are hard. Easy problems are way easier. So lets create two simpler problems.
First, can we sample from a truncated normal distributiuon? That part is easy.
norminv(sumNormProbLimits(1) + rand()*diff(sumNormProbLimits),sumMean,sumStd)
ans =
3637.00633112408
As you can see, that is a number that has atuncated normal distribution, with the desired parameters. So now we can just use randfixedsum. I'll put it all together here:
a = 5;
b = 800;
ndim = 24;
sumMean = (a+b)/2*ndim;
sumVar = (b-a)^2/12*ndim;
sumStd = sqrt(sumVar);
sumLimits = [3550,3650];
sumNormProbLimits = normcdf(sumLimits,sumMean,sumStd);
S = norminv(sumNormProbLimits(1) + rand()*diff(sumNormProbLimits),sumMean,sumStd);
randfixedsum(24,1,S,a,b)
ans =
32.6666351919248
226.171574450247
26.4486244673284
59.5273902733376
105.140425145937
198.085619814426
313.588691531256
125.527412419838
222.108869160278
78.6355991625287
116.832965448386
25.2315920517267
10.7762636314948
295.321051677379
339.225232904641
91.7004913938697
29.5858150543207
184.827862803465
37.7525635267701
115.866539562388
512.657889835195
372.544944227602
13.6087483202661
58.868132270981
So, a set of 24 uniformly distributed random numbers that have their sum in the desired interval. The numbers really are uniformly distributed to lie in the interval [5,800], but the condition that the sum be so relatively small, makes it unlikely that any of them are near the maximum. On this random set, the max was as large as 512. That seems reasonable to me, under the conditional sampling that was required.
sum(ans)
ans =
3592.70093432559
It was really pretty easy to do. I did need randfixedsum, which you can get from the File Exchange for free download.
2 个评论
John D'Errico
2019-4-29
Ok, then you are a little better off using my solution, though the difference is not that huge.
For example, in my solution, you should expect to see more samples with a sum near 3650 than near 3550. The differential (based on the PDF at those points) should be not quite twice as often. In Rik's solution, the two ends of that interval on the sum will be equally likely. Is this s huge difference? Perhaps not. It very much depends on what you are doing with the results. Since you are actually testing things to look at the distributions, you will prefer the more technically correct solution.
A very important thing to note however, is that a sample size of only 100 such sets is tiny. TINY. MINISCULE. In fact, when you are looking at things like this, I would be looking at millions of such samples at the least.
更多回答(1 个)
Rik
2019-4-29
编辑:Rik
2019-4-29
The reason for your infinite loop is that you need an average random value of 150, which is far from the expected value of 402.5. That means you need quite an extreme situation to land on your required vector.
You can use randfixedsum from the FEX. This function takes a more mathematical approach, which means that it doesn't need to reject vectors that don't satisfy the constraints.
%output matrix size
n=24;m=1;
%generate a random sum between [3550 3650]
s_bounds=[3550 3650];s=rand*diff(s_bounds)+s_bounds(1);
%bounds of values themselves
a=5;b=800;
%generate the values
x = randfixedsum(n,m,s,a,b);
%transpose to move from 24x1 to 1x24
x=x';
6 个评论
Rik
2019-4-29
Thank you for clarifying. Statistics aren't my strong suit, so questions like this reach my limit.
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!