ranking (ordering) values with repeats

50 次查看(过去 30 天)
Hello Community,
Im hoping some of you have a clever solution to this problem. Im looking for fast and efficient way to rank (order)a vector of numbers in a particular way when repeated values arise.
To make it simple, suppose I have a row vector:
data = [-1 2 0 -2 0]
I know I can rank them using the 3rd output of "unique":
>> [~,~,rnk] = unique(data)
rnk =
2 4 3 1 3
What I like about this is that it assigns the same rank to the repeated zeros. What I don't like about this is that the top rank is now "4" even though I have 5 values. I would prefer this:
>> rnk = myrank(data)
rnk =
2 5 3 1 3
Ive also played around with the second output of "sort" quite a bit, but since this output produces indicies of the sorted values within the original array, there is no simple way (that I've found) to associate the same rank with repeated values.
Im just wondering if there is something simple that Im missing.
Thanks!

采纳的回答

Oleg Komarov
Oleg Komarov 2012-3-23
If you have the Statistics Toolbox, but it's kinda an overkill:
floor(tiedrank([-1 2 0 -2 0]))
ans =
2 5 3 1 3
Otherwise:
data = [-1 2 0 -2 0];
% Sort data
[srt, idxSrt] = sort(data);
% Find where are the repetitions
idxRepeat = [false diff(srt) == 0];
% Rank with tieds but w/o skipping
rnkNoSkip = cumsum(~idxRepeat);
% Preallocate rank
rnk = 1:numel(data);
% Adjust for tieds (and skip)
rnk(idxRepeat) = rnkNoSkip(idxRepeat);
% Sort back
rnk(idxSrt) = rnk
rnk =
2 5 3 1 3
  4 个评论
sunbeam
sunbeam 2013-3-6
There is something wrong here; for
data = [1 -3 -3 2 23 23];
The result is
rnk =
3 1 1 4 5 4
Tommaso Fornaciari
Tommaso Fornaciari 2016-12-4
Hi is there a way to assign equal observations two different subsequent ranks? following on the original question, the output i would need is 2 5 4 1 3 or 2 5 3 1 4
Thank you

请先登录,再进行评论。

更多回答(4 个)

Raph
Raph 2015-5-4
It should also work with sort() and ismember()
data_sorted = sort(data);
[~, rnk] = ismember(data,data_sorted)
  1 个评论
Bradley Stiritz
Bradley Stiritz 2016-5-28
Very impressive, Raph! Thanks for your contribution. Excellent use of built-in vectorized functions.

请先登录,再进行评论。


sunbeam
sunbeam 2013-3-6
This should work. I couldn't figure out how to do it without a loop, but at least this only loops over the duplicate entries. Someone let me know if you come up with a better way.
function outrank = rankWithDuplicates(data,mode)
% R = rankWithDuplicates(data,mode) ranks the values in the data variable
% according to size, allowing for duplicates. Whereas sort actually
% rearranges the input, and therefore duplicates get assigned different
% indices, rankWithDuplicates will simply output the rank order allowing
% ties for duplicate entries. For example,
%
% rankWithDuplicates([1 1 5 8 8 10])
%
% will output [1 1 3 4 4 6]; and if these entries are shuffled like
%
% rankWithDuplicates([8 1 5 1 10 8])
%
% the output will be [4 1 3 1 6 4].
%
% INPUT: data, a vector of real numbers.
% mode, an optional input which can be 'ascend' or 'descend'
%
% OUTPUT: the rank order of the input data.
%
if nargin==1
mode='ascend';
end
[~,b]=size(data);
if b==1
data=data';
end
% Sort data
[srt, idxSrt] = sort(data,mode);
% Find where are the repetitions and negate
idxRepeat = [false diff(srt) == 0];
% Loop through where there are duplicates and maintain the rank.
% I'm not sure if this is necessary but it's the only way I could get it
% done.
rnk = 1:numel(data);
loopidx=find(idxRepeat>0);
for i=loopidx
rnk(i)=rnk(i-1);
end
% Return order according to original sort
outrank(idxSrt)=rnk;

Jeyamugan T
Jeyamugan T 2017-4-7
I wrote this code for some other purpose but it may useful for this problem.
function [rkList]=arrayRankEx(O)
cO=sort(O);
n=size(O,2);
rkList=zeros(1,n);
in=1;
while(in<=n)
out=1;
co=0;
while(out<=n && in<=n)
if(O(out)==cO(in))
rkList(out)=in;
co=co+1;
end
out=out+1;
end
in=in+co;
end
end
>>[5 7 -2 1 -1 0 0 1 5 3]
ans =
5 7 -2 1 -1 0 0 1 5 3
>> arrayRankEx([5 7 -2 1 -1 0 0 1 5 3])
ans =
8 10 1 5 2 3 3 5 8 7

Benjamin Levy
Benjamin Levy 2017-11-16
Not sure if this is still a 'live' thread, but the code should report these ranks for ascending order: 2.0000 5.0000 3.5000 1.0000 3.5000.
Now, suppose your data set is data = [ 11 20 2 14 15 11 13 20 7 9 1 5 17... 7 5 16 3 5 20 ]; Your answer for ascending order (correcting for ties), using sortrows([ data' ranks ],2), should provide column 1 = data, column 2 = ranks:
1.0000 1.0000
2.0000 2.0000
3.0000 3.0000
5.0000 5.0000
5.0000 5.0000
5.0000 5.0000
7.0000 7.5000
7.0000 7.5000
9.0000 9.0000
11.0000 11.0000
11.0000 11.0000
11.0000 11.0000
13.0000 13.0000
14.0000 14.0000
15.0000 15.0000
16.0000 16.0000
17.0000 17.0000
20.0000 19.0000
20.0000 19.0000
20.0000 19.0000
Note that there are several sections in the sorted data wherein there are consecutive runs of same integers (e.g., ...5 5 7 7 ).
Using your code and my data set, and the same final sort, I have (column 1 data, column 2 ranks):
1 1
2 2
3 3
5 4
5 4
5 4
7 5
11 7
7 7
11 7
9 9
11 10
13 13
20 13
20 13
14 14
15 15
16 16
17 17
20 18
  1 个评论
Nataraja M
Nataraja M 2018-3-26
Hello Sir I used above command sortrows([ data' ranks ],2) for ranking vectors from maximum to lowest, but facing error like Not enough input arguments. Can you please help me to solve this error Thank you

请先登录,再进行评论。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by