How to find the peak/most prevalent values in a set of data (max x-value of histogram)?

Question

Naveen 2015-7-14

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/229843-how-to-find-the-peak-most-prevalent-values-in-a-set-of-data-max-x-value-of-histogram

回答： Steven Lord 2015-7-15

I am analyzing acoustic waveforms and I have used MATLAB to make a table for each test that calculates and lists several parameters (amplitude, peak frequency, duration, etc.) for each waveform. I want to compare these signals between different tests, and as a result, I want to summarize each of the parameters for each test. For example, I think it would be useful to compare the average duration of all 83 waveforms in Test 1 with the average duration of all 102 waveforms in Test 2. I can do this, but I've realized that the average isn't very useful for some parameters since the dataset has such a large range and some outliers disproportionately weigh the average.

So now instead of finding the average for each parameter, I want to find the values that are most prevalent.

For example, take this dataset (representing the duration in microseconds for 23 acoustic waveforms that occurred in one test).

 duration = [212 61 276 213 188 62 212 275 214 212 32 62 62 250 63 61 96 77 64 32 62 213 64];

The average for this is 133. However, just by looking at the data (and confirmed via a histogram), it's clear that there are two values that are much more prevalent (around 212 and 62) than the rest, so the average of 133 doesn't really reveal much. I'm looking to write a script that automatically picks out these values that occur significantly more often than other values in the dataset. To simplify my question, let's say I only want my script to find the one value that occurs most often (in this case, 62).

To start with, my thought process was to make a histogram and then find the x-axis value of the bin with the highest intensity.

 r = round(range(duration));
 h = histogram (duration,r/4)

If you plot this histogram, it looks like the most prevalent value is in the bin that represents 60-64 microseconds. From there, I can just split the difference and say that 62 microseconds is the most prevalent duration in this set of waveforms.

I'm looking for the code that will give me this. I know it might not be the most statistically sound since within that bin, there might be an uneven distribution so that splitting it in half and saying 62 microseconds might not be the most accurate. But for my current purposes, that is okay.

2 个评论
显示无隐藏无

Naveen 2015-7-14

在 MATLAB Online 中打开

I have found a way to do this. It might not be the most efficient way, but it works for what I want. Basically, the histogram function creates a class with a bunch of components like an array that lists the boundaries of each bin (h.BinEdges), and an array (h.Values) that lists the values for each bin (how many components of the raw data fits inside that bin). The histogram itself is basically a bar graph of h.BinEdges vs. h.Values.

So I'm finding all of the local maxima within the Values and then matching up the index to find the BinEdges of the maxima. And then I'm replacing that max with 0 so that the next iteration of that procedure will now find the second highest local maxima. The end result is that durationmax1, durationmax2, and durationmax3 will show the three most prevalent values of duration.

 r = round(range(duration));
h = histogram(duration,round(r/4));
[pks,locs,w,p] = findpeaks(h.Values);
count = 0;
for i = 1:length(pks)
    if(pks(i) == max(pks))
        count = i;
        pks(i) = 0;
        break
    end
end
durationmax1 = (h.BinEdges(locs(count)) + (h.BinEdges(locs(count) + 1)))/2;
for i = 1:length(pks)
    if(pks(i) == max(pks))
        count = i;
        pks(i) = 0;
        break
    end
end
durationmax2 = (h.BinEdges(locs(count)) + (h.BinEdges(locs(count) + 1)))/2;
for i = 1:length(pks)
    if(pks(i) == max(pks))
        count = i;
        pks(i55) = 0;
        break
    end
end
durationmax3 = (h.BinEdges(locs(count)) + (h5.BinEdges(locs(count) + 1)))/2;
  end

Image Analyst 2015-7-15

Without seeing your signals and histograms, it's too much of a brain strain tonight (for me at least) to try to imagine what they look like. You'd probably get more answers if you showed some plots of your signals and histogram.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

the cyclist 2015-7-14

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/229843-how-to-find-the-peak-most-prevalent-values-in-a-set-of-data-max-x-value-of-histogram#answer_186033

在 MATLAB Online 中打开

Here is another method that might be helpful. Kernel smoothing methods do that kind of "these values are likely to be from around here", in a non-parametric way.

The ksdensity function from the Statistics and Machine Learning Toolbox implements this. Here is one way to handle your problem.

[peakDensity,xi] = ksdensity(duration)
[maxDensity,peakIndex] = max(peakDensity)
peakLocation = xi(peakIndex)

This shows you what the smoothed density looks like:

figure
ksdensity(duration)

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 2

Steven Lord 2015-7-15

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/229843-how-to-find-the-peak-most-prevalent-values-in-a-set-of-data-max-x-value-of-histogram#answer_186076

Use HISTCOUNTS instead of HISTOGRAM. HISTCOUNTS returns the bin counts directly (as well as the bin edges) so you can then use MAX or SORT on those counts.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

How to find the peak/most prevalent values in a set of data (max x-value of histogram)?

2 个评论
显示无隐藏无

回答（2 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

Community Treasure Hunt

How to find the peak/most prevalent values in a set of data (max x-value of histogram)?

2 个评论 显示 无隐藏 无

回答（2 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

Community Treasure Hunt

2 个评论
显示无隐藏无

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论