Calculating the probability of a data point in a histogram

10 次查看(过去 30 天)
Hello:
The image below is a histogram of a large data set (90*1 double in blue) and a single data point (in red). I would like to compute the probability of the data (in red) against the blue data points. I could counts the counts on the left of the red bar and divide it by the total counts (90). But I want a matlab code that will do it more efficiently and in a faster way probably without even using the histogram. Thank you.

采纳的回答

Steven Lord
Steven Lord 2018-5-8
Change the Normalization property of the histogram object then get the appropriate element of the Values property of that object.
rng default
x = randn(10000,1);
h = histogram(x)
h.Values(10)
Since the default Normalization method is 'count', this will tell you that there are 133 elements of x that fall into bin 10. [Since I used rng default, you should get the exact same random numbers in x as I did and so generate the exact same histogram.]
h.Normalization = 'probability';
h.Values(10)
Now h.Values(10) is 0.0133 which makes sense: 133 / 10000 (the total number of points) = 0.0133.
If you wanted to get the same information without actually bringing up the plot, the histcounts function also lets you specify a 'Normalization' method.
And I'd guess that histogram you showed was created with something more like 900 data points than 90. According to the Y limits each of the 5 central bars contain more than 90 elements, assuming you're using the default 'count' Normalization. Still not Big Data, but bigger.

更多回答(1 个)

Image Analyst
Image Analyst 2018-5-7
You need to know the edges of the bin, e1 and e2. Then you can simply do
percentageInBin = sum(data>=e1 & data < e2) / numel(data);
No histogram needed if you just need it for that one red bin.
By the way, it made me snicker when you described 90 elements as large. It literally would have to be around a million times that big before anyone might start considering it large.
  3 个评论
Curious Mind
Curious Mind 2018-5-8
Also if I have say a dataM (20*1) double matrix can I get the probabilities of all the rows in dataM at once against the data with 90 elements?
Image Analyst
Image Analyst 2018-5-8
Just the bar in red.
To do it without explicitly computing a histogram array, you'd have to do it one bin at a time. Much better to simply get the histogram and divide the counts array by the total counts. Why can't you compute the histogram?

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Data Distribution Plots 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by