ranking (ordering) values with repeats
50 次查看(过去 30 天)
显示 更早的评论
Hello Community,
Im hoping some of you have a clever solution to this problem. Im looking for fast and efficient way to rank (order)a vector of numbers in a particular way when repeated values arise.
To make it simple, suppose I have a row vector:
data = [-1 2 0 -2 0]
I know I can rank them using the 3rd output of "unique":
>> [~,~,rnk] = unique(data)
rnk =
2 4 3 1 3
What I like about this is that it assigns the same rank to the repeated zeros. What I don't like about this is that the top rank is now "4" even though I have 5 values. I would prefer this:
>> rnk = myrank(data)
rnk =
2 5 3 1 3
Ive also played around with the second output of "sort" quite a bit, but since this output produces indicies of the sorted values within the original array, there is no simple way (that I've found) to associate the same rank with repeated values.
Im just wondering if there is something simple that Im missing.
Thanks!
0 个评论
采纳的回答
Oleg Komarov
2012-3-23
If you have the Statistics Toolbox, but it's kinda an overkill:
floor(tiedrank([-1 2 0 -2 0]))
ans =
2 5 3 1 3
Otherwise:
data = [-1 2 0 -2 0];
% Sort data
[srt, idxSrt] = sort(data);
% Find where are the repetitions
idxRepeat = [false diff(srt) == 0];
% Rank with tieds but w/o skipping
rnkNoSkip = cumsum(~idxRepeat);
% Preallocate rank
rnk = 1:numel(data);
% Adjust for tieds (and skip)
rnk(idxRepeat) = rnkNoSkip(idxRepeat);
% Sort back
rnk(idxSrt) = rnk
rnk =
2 5 3 1 3
4 个评论
sunbeam
2013-3-6
There is something wrong here; for
data = [1 -3 -3 2 23 23];
The result is
rnk =
3 1 1 4 5 4
Tommaso Fornaciari
2016-12-4
Hi is there a way to assign equal observations two different subsequent ranks? following on the original question, the output i would need is 2 5 4 1 3 or 2 5 3 1 4
Thank you
更多回答(4 个)
Raph
2015-5-4
It should also work with sort() and ismember()
data_sorted = sort(data);
[~, rnk] = ismember(data,data_sorted)
1 个评论
Bradley Stiritz
2016-5-28
Very impressive, Raph! Thanks for your contribution. Excellent use of built-in vectorized functions.
sunbeam
2013-3-6
This should work. I couldn't figure out how to do it without a loop, but at least this only loops over the duplicate entries. Someone let me know if you come up with a better way.
function outrank = rankWithDuplicates(data,mode)
% R = rankWithDuplicates(data,mode) ranks the values in the data variable
% according to size, allowing for duplicates. Whereas sort actually
% rearranges the input, and therefore duplicates get assigned different
% indices, rankWithDuplicates will simply output the rank order allowing
% ties for duplicate entries. For example,
%
% rankWithDuplicates([1 1 5 8 8 10])
%
% will output [1 1 3 4 4 6]; and if these entries are shuffled like
%
% rankWithDuplicates([8 1 5 1 10 8])
%
% the output will be [4 1 3 1 6 4].
%
% INPUT: data, a vector of real numbers.
% mode, an optional input which can be 'ascend' or 'descend'
%
% OUTPUT: the rank order of the input data.
%
if nargin==1
mode='ascend';
end
[~,b]=size(data);
if b==1
data=data';
end
% Sort data
[srt, idxSrt] = sort(data,mode);
% Find where are the repetitions and negate
idxRepeat = [false diff(srt) == 0];
% Loop through where there are duplicates and maintain the rank.
% I'm not sure if this is necessary but it's the only way I could get it
% done.
rnk = 1:numel(data);
loopidx=find(idxRepeat>0);
for i=loopidx
rnk(i)=rnk(i-1);
end
% Return order according to original sort
outrank(idxSrt)=rnk;
0 个评论
Jeyamugan T
2017-4-7
I wrote this code for some other purpose but it may useful for this problem.
function [rkList]=arrayRankEx(O)
cO=sort(O);
n=size(O,2);
rkList=zeros(1,n);
in=1;
while(in<=n)
out=1;
co=0;
while(out<=n && in<=n)
if(O(out)==cO(in))
rkList(out)=in;
co=co+1;
end
out=out+1;
end
in=in+co;
end
end
>>[5 7 -2 1 -1 0 0 1 5 3]
ans =
5 7 -2 1 -1 0 0 1 5 3
>> arrayRankEx([5 7 -2 1 -1 0 0 1 5 3])
ans =
8 10 1 5 2 3 3 5 8 7
0 个评论
Benjamin Levy
2017-11-16
Not sure if this is still a 'live' thread, but the code should report these ranks for ascending order: 2.0000 5.0000 3.5000 1.0000 3.5000.
Now, suppose your data set is data = [ 11 20 2 14 15 11 13 20 7 9 1 5 17... 7 5 16 3 5 20 ]; Your answer for ascending order (correcting for ties), using sortrows([ data' ranks ],2), should provide column 1 = data, column 2 = ranks:
1.0000 1.0000
2.0000 2.0000
3.0000 3.0000
5.0000 5.0000
5.0000 5.0000
5.0000 5.0000
7.0000 7.5000
7.0000 7.5000
9.0000 9.0000
11.0000 11.0000
11.0000 11.0000
11.0000 11.0000
13.0000 13.0000
14.0000 14.0000
15.0000 15.0000
16.0000 16.0000
17.0000 17.0000
20.0000 19.0000
20.0000 19.0000
20.0000 19.0000
Note that there are several sections in the sorted data wherein there are consecutive runs of same integers (e.g., ...5 5 7 7 ).
Using your code and my data set, and the same final sort, I have (column 1 data, column 2 ranks):
1 1
2 2
3 3
5 4
5 4
5 4
7 5
11 7
7 7
11 7
9 9
11 10
13 13
20 13
20 13
14 14
15 15
16 16
17 17
20 18
1 个评论
Nataraja M
2018-3-26
Hello Sir I used above command sortrows([ data' ranks ],2) for ranking vectors from maximum to lowest, but facing error like Not enough input arguments. Can you please help me to solve this error Thank you
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!