Create a histogram of data that is already "bincounts"

57 次查看(过去 30 天)
Dear Matlab community,
I have a set of counts, X_counts, supposed to reprensent the projection of counted particles over one dimension (X).
I would like to use the histogram function because i have in mind to fit then a Poisson distribution to this data set with histfit.
But as X_counts contains mainly "empty" counts, so zeros, my distribution and my histogram peak at 0 as one can see by plotting:
histogram(X_counts)
I tried defining the edges such as:
ctrX = edgesX(1:end-1)+ diff(edgesX)/2;
figure();plot(ctrX, histcounts(X_counts',edgesX));
At the end, i just think i should create another vector that has edgesX values * Xcounts (e.g., there is 400 times the value 31, 0 times the value 27, etc), but i was wondering if there would be something straightforward I could do with the current data and the histogram function, and that i would be missing. Meanwhile i will try fitting this data with fittype and "hold on" on the bar chart i created (attached image).
Thank you for your help!
  4 个评论
TADA
TADA 2022-10-11
histfit uses fitdist to fit distribution functions.
fitdist uses the actual data and not the histogram counts (like you have).
I'm afraid you're gonna have to either use curve fitting on the histogram you (you'll probably have to scale the histogram though), which is probably a bit more accurate than "simmulating" the original data like you suggested and using histfit.
Your best bet is if you can get a hold of the original measurements (not the summary), then use histfit like you planned originally.
Au.
Au. 2022-10-12
@Matt J, @Star Strider sorry if that wasn't clear. The bar chart I used is the expected outcome. I thought using the histogram and the histfit functions would be a better/simpler way to proceed, because I need to extract the most probable position of the hits. As @TADA said, this will work for the original data set, a collection of coordinates, where i can use the histogram function and probably the histfit then, but not on the counts (this is what I wasn't sure about).
@Star Strider I understand that the b1 coefficent is the 'lambda', so the most probable value of the distribution. Using counts*bins is for sure the most correct way to extract it, but the extracted value then cannot be used as the most probable position.
Thank you for taking the time to answer.

请先登录,再进行评论。

回答(2 个)

Steven Lord
Steven Lord 2022-10-11
Addressing just the question of plotting a histogram given bin counts and bin edges rather than the raw data, you can do this by specifying the BinCounts and BinEdges name-value arguments in your histogram call. I'm going to use histcounts to bin the data but if you have another way to bin the data you could use that instead.
x = randn(1, 1e5);
[counts, edges] = histcounts(x);
figure
histogram(x)
title('Let histogram bin the data')
figure
histogram('BinCounts', counts, 'BinEdges', edges)
title('Use the counts and edges from histcounts')

Image Analyst
Image Analyst 2022-10-11
Several of us (Matt and especially @Star Strider) have expressed doubts. I also do. I question the whole premise. I think this is a good example of a scenario like this
You ask "how do I do X?"
We show you how to do X.
You say "but that doesn't solve my problem."
We say, "what is your real problem."
You say "Well I want to do Y."
We say "Well if you want to do Y, you should not do X, you should do Z."
So you have multidimensional data. Let's say 2 dimensions, like an image of particles, and you have dimensions X and Y. So you take a projection over X, for example you get the average vertical profile going down the image by summing all pixels in each column for each row. You have a column vector of "rows" by 1 values where each value is the sum horizontally across all columns. So maybe row 1 had a sum of 1000 and column 2 had a sum of 500, and column 3 had a sum of 1400, etc. Now you're saying that you consider this a histogram, which is not an accurate description, and then you say you want to fit this to a Poisson distribution, which I think is the second problem (wrong thing to do). Why do you think this profile should follow a Poisson distribution? I see no justification for that assumption. Secondly, what are your values? You know that for lambda's more than 5 or 10, the Poisson is pretty much the same shape as a Gaussian. If your values are in the hundreds, your distribution would pretty much be Gaussian. If it were skewed, the distribution would probably be log-normal, and there is good theoretical basis for that in particle sizing theory. However I still think that you don't want to get a probability distribution from your profile/projection of your data along one dimension.
So let's say that you were able to turn your projection into a probability distribution (either by using the data itself or by taking a histogram of the projected/summed values), then what? What will you do with that information?
  3 个评论
Image Analyst
Image Analyst 2022-10-12
Not sure if you have an image or not, but the image itself is a "map" of the probability of hits. So wouldn't the "most probably position" simply be the weighted centroid of the "hits" (pixel gray levels)?
mask = true(size(grayImage));
props = regionprops(mask, grayImage, 'WeightedCentroid');
xCentroid = props.WeightedCentroid(1);
yCentroid = props.WeightedCentroid(2);
Yes, Poisson is used for counts, and if your counts are low, like less than 10 or 20, then I think the histogram of the whole image (taking each pixel value as a count), or whole data set, should give a Poisson distribution. If the counts are in the hundreds, then it will still be Poisson but look pretty much Gaussian, which might be more convenient mathematically for dealing with it.
However if you project (sum) along one dimension I don't know how that is still Poisson. For example, the sum of samples drawn from a Gaussian distribution is no longer a Gaussian distribution - it's chi square distribution. But maybe it is. I'm not a Ph.D. statistician. Maybe it's a compound Poisson distribution
Au.
Au. 2022-10-14
Yes the weighted centroid is appropriate! I will use this :)
Thank you for the interesting input about the distributions - in my case, the final distribution depends on the initial parameters of the simulation (circular, conic, gaussian, etc. initial distributions).

请先登录,再进行评论。

产品


版本

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by