Draw samples using a Non-Gaussian distribution
显示 更早的评论
Hi!
Let's say I have a vector X of 100 values. How can I draw a sample from this vector using a Non-Gaussian distribution?
Consider the following example in which I'm trying to draw 50 values from the original vector:
x = randn(100,1);
x(random('poisson',1:50))
Here, the Poisson distribution is just an example. This does not work because of the values I got from the random distribution are not always positive integers or logical values.
Any advice?
Thanks!
5 个评论
Torsten
2023-9-18
Taking your example, you want to define a distribution on the first 100 integers and draw numbers from x according to this distribution ?
user20912
2023-9-18
No, it doesn't clarify the question.
As far as I understand your question, you want to draw indices between 1 and 100 according to a given distribution and sample the respective elements of the vector x. This is what you would do if the vector random('poisson',1:50) was a vector of 50 integer random numbers between 1 and 100. If this interpretation is not correct, you will have to elaborate on your task.
user20912
2023-9-19
Paul
2023-9-19
So you have a discrete random variable N. Its probability distribution function is P(N = n) = p(n) for n = 1:100, and 0 otherwise. As must be the case, sum(p) = 1.
Do you know the distribution function, i.e., the values of p(n), n = 1:100?
Once you define the distribution function, you want to generate 50 samples of N. Are those samples to be generated with or without replacement? The former is straightforward based on the distribution function. The latter would involve reconditioning p(n) after each selection, but also doesn't sound too bad.
采纳的回答
更多回答(3 个)
x = randn(100,1);
x(randperm(numel(x),50))
10 个评论
Torsten
2023-9-18
I think "randperm" doesn't correspond to a distribution on 1:100.
Torsten
2023-9-18
I think it is a distribution on (1:100)^100 with p(a1,...,a100) = 1/100! if all ai are different, 0 else.
I didn't follow that. randperm(N,k) will select k elements from 1:N without replacement according to a uniform distribution. This seems to meet the requirement of the OP, which is that the distribution be non-Gaussian.
Of course, any distribution on vectors of integer subsets would be non-Gaussian, so it wasn't really a high bar.
user20912
2023-9-18
@user20912 it does not draw according to a Gaussian distribution. It draws according to a uniform distribution, which is not Gaussian.
However, if you even think it was possible to draw from x with a Gaussian distribution, you are confused about something. There is no way the subsets of indices 1:100 could ever have a Gaussian distribution defined on it. The Gaussian distribution is a continuous distribution while the subsets of 1:100 are finite/discrete.
Torsten
2023-9-18
I'm not sure how to define p(X=i) for 1 <= i <= 100 if X is distributed according to a "randperm distribution" because the draws from 1:100 are draws without replacement, thus depend on each other.
"Randsample" with replacement comes closer to what I'd call a distribution on {1,2,...,100}.
I'm not sure how to define p(X=i) for 1 <= i <= 100
You wouldn't define p(X=i) because drawing individual indices i is not the task. The sample space we're drawing from randomly are the nchoosek(100,50) different length-50 subsets of the integers 1:100. Each of these subsets has a uniform probability of 1/nchoosek(100,50) when drawn using randperm.
user20912
2023-9-19
I think you don't understand random numbers, AND you don't understand indexing in MATLAB. What does this do?
random('poisson',1:50)
It generates a list of 50 Poisson random numbers, with 50 different rate parameters, the numbers 1:50. I have absolutely no idea why you would want to do that.
And worse, some of those Poisson samples will be ZERO. Some might even be larger than 100.
Then you are trying to index into a vector of length 100 using those Poisson samples. The result is? Garbage, since you will often see an indexing error.
Finally, the resulting sample? It will have the same underlying distribution as your original vector x, since you are just sampling from the original vector.
What you mean about a non-Gaussian sample? Using a Poisson distribution there is meaningless.
I think you just want to sample from the vector x. For example, suppose you have a vector of prime numbers. They certainly are not Gaussian, or even uniform.
x = primes(100)
Now we can sample from that set.
nx = numel(x);
This will be a sample WITH replacement. So some elements may be replicated.
x(randi(nx,[1,10]))
Sampling without replacement is as easy.
x(randperm(nx,10))
Be careful, as sampling without replacement is not possible, if you want to sample more than 25 elements from a vector of length 25. (Why do I feel this should be both unecessary to say, as well as totally necessary on this forum?)
3 个评论
Torsten
2023-9-18
Finally, the resulting sample? It will have the same underlying distribution as your original vector x, since you are just sampling from the original vector.
I think this is wrong.
John D'Errico
2023-9-18
编辑:John D'Errico
2023-9-18
@torsten: I'm sorry, but it is NOT wrong. It does matter how you sample, of course. So I might give you credit for misunderstanding my statement.
If the index operation has no connection to the samples themselves, then a random sampling from that set does not change the distribution of the sample. That is EXACTLY what was done by the OP.
For example, suppose I have a very simple set:
x = rand(1,5);
the vector x is statistically uniformly distributed, on the interval [0,1]. Now, choose 3 elements from that set, also randomly chosen.
y = x(randperm(5,3));
The vector y WILL be just as uniformly distributed on the interval [0,1], as was the vector x. Can you do things that are not what I was talking about, but are a bit silly? Yes.
z = x(ones(1,1000));
Then z is just a random number, with uniform distribution on the interval [0,1], but then repeated 1000 times. We could split hairs about the distribution of z, but that seems meaningless to me, and was not in the spirit of what I was saying.
Or, if you create y, by selectively sampling more often from the elements of x that are less than 1/2, or something equally silly, then of course it will change things.
z = x(ones(1,1000));
z is dirac_x(1) distributed.
As far as I understood, that's an extreme case for what the OP aims at: x(distribution(1:50)) where "distribution" can be any distribution defined on the set {1,2,...,100}. So indices (unlike for randperm) will most probably repeat.
x = rand(1,10000); % whatever, randn(1,100) in your case
n = 5000; % number of drawing 50 in your case
% Set up wanted drawing probability
ix = 1:numel(x);
lambda = 60;
p = poisspdf(ix,lambda);
% generate index according to the above discrete pdf
c = cumsum([0 p]);
c = c/c(end);
rix = discretize(rand(1,n), c);
% Check graphically what the drawing pdf looks like
histogram(rix,'Normalization','pdf')
current_xlim = xlim;
% Compare with the prescribed pdf, it looks OK
hold on
plot(ix, p, '-+', 'Linewidth', 2);
xlim(current_xlim)
title('This is truncated poisson pdf')
% draw randomy from x (with replacement) with the above pdf
rx = x(rix)
% how it looks like, still random like
% histogram(x)
类别
在 帮助中心 和 File Exchange 中查找有关 Descriptive Statistics 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
