Get indeces of any quantile of a column

30 次查看(过去 30 天)
Hello everybody,
as of now I´m trying to sort a large (101x1168) matrix. I am always sorting the first column, on which the following three columns depend upon. I want to be able to get any of the indeces of, for example the top 10 % cent of the values, or the values between the .3 and .4 quantile of the first column, to adress those with a function. As of now I have used several sortrows(), but it takes a long time to run. It is important to know that the length of the columns may vary ( Some of the columns have more NaNs than others) and thus it would be amazing if it was a function that ignores NaNs (maybe a combination of quantile() and find()?)
Here an example of what I need:
Col. 1 Col. 2 Col. 3 Col. 4
15 18 12 32
14 23 19 12
10 7 18 12
9 34 12 13
11 19 3 17
I know want to know the Index and the value of the top 20% values a in the first column. In this case it would be 1. and 15. If implemented correctly I would be able to get a vector output with all the data.
Any help is truly appeciated! Many thanks and kind regards, A.Goe

回答(1 个)

Image Analyst
Image Analyst 2016-8-26
If you have the Statistics and Machine Learning Toolbox, there is prctile(). Would that help?
Y = prctile(X,p) returns percentiles of the values in a data vector or matrix X for the percentages p in the interval [0,100]. If X is a vector, then Y is a scalar or a vector with the same length as the number of percentiles required (length(p)). Y(i) contains the p(i) percentile.
If X is a matrix, then Y is a row vector or a matrix, where the number of rows of Y is equal to the number of percentiles required (length(p)). The ith row of Y contains the p(i) percentiles of each column of X.
For multidimensional arrays, prctile operates along the first nonsingleton dimension of X.
  2 个评论
A. Goeh
A. Goeh 2016-8-27
Hello , first of all thank you for your answer. I tried prctile(), problem here is that the results don`t necessarily have to be values that can be found in the original dataset, thus I can´t search for the indeces of the results...I´m thinking about being able to split the vector ( column) in same length pieces and search for the first and last index, altough not very successful, to be honest.
Image Analyst
Image Analyst 2016-8-27
If the values must be in your data, then you can use cumsum() to create the cdf, then use find to find the value. Untested code:
col1 = sort(data(:, 1), 'ascend');
cdf = cumsum(col1); % Compute cdf
cdf = cdf/cdf(end); % Normalize
% Find index of top 20 %
index = find(cdf >= 0.8, 1, 'first');
dataValue = col1(index);

请先登录,再进行评论。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by