How to estimate probabilities of an arbitrary range, based on the probability distribution of a given a data set of numbers?

23 次查看(过去 30 天)
Hello,
Given a series of values x, I want to estimate the probabilities of a range of numbers U, in(using) the probability distribution of the given series x. My code works for one value, but I need probabilities of a range, Can somebody give me some feedback please?
Thank you in advance.
This is the code:
%%Generate some data/series
x=randi([-2 50],25,1);
%Values/ranges of interest
U=[-100:100];
%define histogram and probability distribution of x
h = histogram(x);
h.Normalization = 'probability';%Changing count in probabilities
h.Values(U); %finding probabilities of range U

采纳的回答

Bruno Luong
Bruno Luong 2018-10-22
编辑:Bruno Luong 2018-10-22
Use HISTCOUNTS then
N = histcounts(x, [-Inf, U, Inf]);
P = N(2:end) / sum(N)
  4 个评论
Clarisha Nijman
Clarisha Nijman 2018-10-23
x=randi([-3 3],10,1); U=[-5:5];
N = histcounts(x, [-Inf, U, Inf ]) prob = N(2:end) / sum(N)
%alternative code f=hist(x,U); prob=f/sum(f);
Now I fully understand your answer. With this small example it is clear. With the tails you are getting 2 extra intervals. An arbitrary value for U, let's say 2 is associated with interval <1,2] Such that we have eleven intervals, and since the left tail does not live in U, it is excluded, and that's why use (2:end) in the code. Thanks a lot!

请先登录,再进行评论。

更多回答(2 个)

Torsten
Torsten 2018-10-22
%%Generate some data/series
X=randi([-2 50],25,1);
%Values/ranges of interest
U=[-100:100];
X = sort(X)
[countsX, binsX] = hist(X)
cdfX = cumsum(countsX) / sum(countsX)
extrap_left = (min(U) > max(X));
extrap_right = (max(U) > max(X));
p_U_left = interp1(binsX,cdfX,min(U),'linear',extrap_left)
p_U_right = interp1(binsX,cdfX,max(U),'linear',extrap_right)
p_U = p_U_right - p_U_left
  4 个评论
Clarisha Nijman
Clarisha Nijman 2018-10-22
If you want to use data you can not do that, that would be excluding situations that possibly might occur. That is why the frequency polygon is a smooth line. To estimate values in between.
Torsten
Torsten 2018-10-22
编辑:Torsten 2018-10-22
If you get discrete values from a random variable, say [ 1 2 4 5 6 ], how should it be possible to tell p({3}) ? (Hint: It's impossible).
In my opinion, the most reasonable estimate would be p=0 since it does not appear in the list.
If you know the distribution the values stem from, you can get a Maximum Likelihood Estimate (MLE) of the parameters describing the distribution. Having calculated these parameters, you can give estimates of probabilities for elements of your choice.

请先登录,再进行评论。


Bruno Luong
Bruno Luong 2018-10-22
编辑:Bruno Luong 2018-10-22
not sure, is it what you want?
x=randi([-2 50],10000,1);
U=[-100:100];
h = histogram(x, U);
  1 个评论
Clarisha Nijman
Clarisha Nijman 2018-10-22
Let's say x is the profit of a shop observed 20 times. and the values are: 2,5,7,2,20,25,35,15,6,-2,15,27,2,20,15,5,7,2,20,25
This can be associated with a probability distribution. And you can plot it.
Now it is asked to estimate the probability of the values in between, and also in the tails. U=-[5 -4 -3 -2 -1 0 1 2 .... 40]

请先登录,再进行评论。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by