29 views (last 30 days)

I have two column vectors, lets call them A and B, and I have created an ordered paring from the values in these two vector.

I would like to make bins from the A values.

Then I would like to calculate the mean, max, standard deviation of the corresponding B values in the bins created from A values.

I have tried using histcounts,splitapply, accumarray, but i havent been able to find a correct solution. Any hints?

The A and B vectors are distance and intensity, respectively

range_intensity is the combined matrix of these two column vectors.

range_intensity =

[NaN NaN

NaN NaN

NaN NaN

NaN NaN

NaN NaN

NaN NaN

NaN NaN

NaN NaN

NaN NaN

NaN NaN

NaN NaN

NaN NaN

26.040001 0.011764706

26.080000 0.019607844

26.112000 0.023529412

26.232000 0.023529412

26.184000 0.031372551

26.240000 0.027450981

26.260000 0.031372551

26.271999 0.031372551

26.275999 0.031372551

26.316000 0.035294119

26.312000 0.035294119

26.351999 0.031372551

26.351999 0.031372551

26.372000 0.031372551

26.424000 0.031372551

26.424000 0.031372551

26.452000 0.031372551

26.480000 0.039215688

26.496000 0.035294119

26.572001 0.031372551

26.552000 0.035294119

26.604000 0.031372551

26.620001 0.035294119

26.680000 0.035294119

26.684000 0.035294119

26.719999 0.035294119

26.747999 0.027450981

26.784000 0.031372551

26.820000 0.031372551

26.848000 0.027450981

26.875999 0.031372551

26.872000 0.031372551

26.920000 0.027450981

26.944000 0.027450981

26.972000 0.031372551

27.020000 0.031372551

27.044001 0.027450981

27.115999 0.035294119

27.132000 0.031372551

27.164000 0.031372551

27.184000 0.035294119]

edges = [0:0.5:250];

[distance_count, indx] = histc(range_intensity(:,1), edges);

% function res=my_mean_omitnan(in)

% res=mean(in,'omitnan');

% end

mean = accumarray(indx+1, range_intensity(indx+1,2), [],@(x)mean(x,'omitnan'));

max = accumarray(indx+1, range_intensity(indx+1,2), [], @max);

std = accumarray(indx+1, range_intensity(indx+1,2), [], @std);

bar(max);

hold on;

plot(mean);

grid on;

One problem is that the lenght of edges vector and mean, max vectors doesnt match, so i cant plot the mean and max agianst the edges.

There are also NaN values in the two vectors, which should be discarded for mean, max and standard deviation calculation.

Furtheremore, what would be the best way to visualize this data?

Thanks in advance.

Adam Danz
on 13 Nov 2019

Edited: Adam Danz
on 13 Nov 2019

Generally the edges should cover the span of your data, no more and no less with the exception that the final edge should be slightly larger than your maximum value to ensure that the final bin isn't absorbing extra values.

Binning your data

I suggest using discretize() to group the values in column 1 into discrete groups. The line below uses the range of your data to determine the range of bin edges.

edges = floor(min(range_intensity(:,1))) : .5 : ceil((max(range_intensity(:,1))+.001)*10/5)*5/10;

bins = discretize(range_intensity(:,1), edges);

The code above uses floor() to define the minimum bin edge. Bins are 0.5 units wide. It uses ceil() to define the maximum bin edge but to ensure that the max edge doesn't fall on your maximum data value, it adds 0.001 and then rounds up to the nearest 0.5 (hense, *10/5)*5/10)

Computing group statistics

If you have the statistics and machine learning toolbox, use grpstats() to compute grouped statistics.

[meanVal, maxVal, stdVal] = grpstats(range_intensity(:,2),bins,{@mean, @max, @std});

If you do not have access to the stats and ML toolbox, use splitapply() (or accumarray or other alternatives) to compute your grouped stats.

meanVal = splitapply(@mean,range_intensity(:,2), bins); % Repeat for other stats

Plotting the results

By definition, bin edges will always have 1 additional value than the number of bins. One way to plot binned data is to compute the bin center and use that as the x-value.

binCenters = edges(2:end) - (edges(2)-edges(1))/2;

If the bin edges were set up correctly following the steps above, you should end up with a vector of binCenters that is the same size as your grouped stats values. Plotting is then as simple as

figure()

bar(binCenters,maxVal)

hold on

plot(binCenters, meanVal,'ms')

grid on

Adam Danz
on 13 Nov 2019

Guillaume
on 13 Nov 2019

"I don't know why I keep forgetting about groupsummary()"

Probably because you have the stats toolbox and I don't.

groupsummary will returns as many rows as numel(unique(bins)), so if some bin indices are not present, indeed these will be skipped. The second output of groupsummary will give you the bins matching the rows of the 1st output, so:

[meanmaxstd, bin] = groupsummary(range_intensity(:,2), bins, {'mean', 'max', 'std'});

edit: Or put the whole lot (range_intensity and bins) into a table and you'll get everything as one neat table as output (including number of elements used for each bin).

Adam Danz
on 13 Nov 2019

This gives me the idea of building a function recommendation engine that skims all available command history and custom functions/scripts to get a sense of what functions a user typically uses and then recommends related functions that have rarely been used recently. The goal would be to expose the user to new functions outside of their repertoire.

Supposed the engine would search the user's content and list the top 500 most commonly used functions. For each function recognized by Matlab (ie, not custom functions), the engine could reference the function's official see also section of the documentation page to list related functions and would eliminate ones that are already in the top 500. The ones that are left over can be ranked in order of the number of times they appeared across all of the 500 functions (or at least those that had a documentation page that included a 'see also' section).

For example, if size() and length() are frequently used, both of those functions list numel() in their documentation pages which would then be recommended to the user to check out. It would be a dirty, ugly mess but machine learning algorithms have made products out of uglier messes.

I'll put that on my (long) list of spare-time ideas....

Opportunities for recent engineering grads.

Apply Today
## 2 Comments

## Direct link to this comment

https://ww2.mathworks.cn/matlabcentral/answers/490828-how-to-calculate-mean-of-values-based-on-bins-created-from-a-corresponding-vales#comment_766786

⋮## Direct link to this comment

https://ww2.mathworks.cn/matlabcentral/answers/490828-how-to-calculate-mean-of-values-based-on-bins-created-from-a-corresponding-vales#comment_766786

## Direct link to this comment

https://ww2.mathworks.cn/matlabcentral/answers/490828-how-to-calculate-mean-of-values-based-on-bins-created-from-a-corresponding-vales#comment_766793

⋮## Direct link to this comment

https://ww2.mathworks.cn/matlabcentral/answers/490828-how-to-calculate-mean-of-values-based-on-bins-created-from-a-corresponding-vales#comment_766793

Sign in to comment.