Calculating the probability of a data point in a histogram
10 次查看(过去 30 天)
显示 更早的评论
Hello:
The image below is a histogram of a large data set (90*1 double in blue) and a single data point (in red). I would like to compute the probability of the data (in red) against the blue data points. I could counts the counts on the left of the red bar and divide it by the total counts (90). But I want a matlab code that will do it more efficiently and in a faster way probably without even using the histogram. Thank you.
0 个评论
采纳的回答
Steven Lord
2018-5-8
Change the Normalization property of the histogram object then get the appropriate element of the Values property of that object.
rng default
x = randn(10000,1);
h = histogram(x)
h.Values(10)
Since the default Normalization method is 'count', this will tell you that there are 133 elements of x that fall into bin 10. [Since I used rng default, you should get the exact same random numbers in x as I did and so generate the exact same histogram.]
h.Normalization = 'probability';
h.Values(10)
Now h.Values(10) is 0.0133 which makes sense: 133 / 10000 (the total number of points) = 0.0133.
If you wanted to get the same information without actually bringing up the plot, the histcounts function also lets you specify a 'Normalization' method.
And I'd guess that histogram you showed was created with something more like 900 data points than 90. According to the Y limits each of the 5 central bars contain more than 90 elements, assuming you're using the default 'count' Normalization. Still not Big Data, but bigger.
0 个评论
更多回答(1 个)
Image Analyst
2018-5-7
You need to know the edges of the bin, e1 and e2. Then you can simply do
percentageInBin = sum(data>=e1 & data < e2) / numel(data);
No histogram needed if you just need it for that one red bin.
By the way, it made me snicker when you described 90 elements as large. It literally would have to be around a million times that big before anyone might start considering it large.
3 个评论
Image Analyst
2018-5-8
Just the bar in red.
To do it without explicitly computing a histogram array, you'd have to do it one bin at a time. Much better to simply get the histogram and divide the counts array by the total counts. Why can't you compute the histogram?
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Distribution Plots 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!