Matlab find unique column-combinations in matrix and respective index

203 次查看(过去 30 天)
I have a large matrix with with multiple rows and a limited (but larger than 1) number of columns containing values between 0 and 9 and would like to find an efficient way to identify unique row-wise combinations and their indices to then build sums (somehwat like a pivot logic). Here is an example of what I am trying to achieve:
a =
1 2 3
2 2 3
3 2 1
1 2 3
3 2 1
uniqueCombs =
1 2 3
2 2 3
3 2 1
numOccurrences =
2
1
2
indizies:
[1;4]
[2]
[3;5]
From matrix a, I want to first identify the unique combinations (row-wise), then count the number occurrences / identify the row-index of the respective combination.
I have achieved this through generating strings with num2str and strcat, but this method appears to be very slow. Along these thoughts I have tried to find a way to form a new unique number through concatenating the values horizontally, but Matlab does not seem to support this (e.g. from [1;2;3] build 123). Sums won't work because they would remove the possibility to identify unique combinations. Any suggestions on how to best achieve this? Thanks!

采纳的回答

Guillaume
Guillaume 2017-3-22
More or less the same as Jan's, using accumarray instead of splitapply (I'm still old school!):
A = [ 1 2 3
2 2 3
3 2 1
1 2 3
3 2 1];
[B, ~, ib] = unique(A, 'rows');
numoccurences = accumarray(ib, 1);
indices = accumarray(ib, find(ib), [], @(rows){rows}); %the find(ib) simply generates (1:size(a,1))'
  4 个评论
Guillaume
Guillaume 2017-3-23
编辑:Guillaume 2017-3-23
I suspect that accumarray will be faster as it is built-in compiled code whereas splitapply is m code, but I haven't conducted any test.
Note: for the indices,
indices = accumarray(ib, (1:numel(ib))', [], @(rows){rows});
is probably slightly faster, just not as concise.
Jan
Jan 2017-3-23
编辑:Jan 2017-3-23
@Guillaume: I compare this with cellfun: In older versions Matlab contained the C-sources for this Mex function. Here calling a function handle is very expensive, because the Matlab tier has to be called. Therefore the implicitely defined methods provided by strings are much faster: 'length', 'isclass' etc.
Then using a compiled Mex function is not a real benefit, because mexCallMATLAB has some overhead. This might concern accumarray also. I guess that your accumarray approach is faster than the loop, but I know that it looks very cryptic ;-)
But now I can leave the speculations and run a test: With
A = randi([1, 100], 1e5, 3); % Test data
my loop takes 14.75 seconds, your accumarray approach takes 0.44 seconds. The results differ in the order of the indices. So perhaps this is wanted:
[B, iB, iA] = unique(A, 'rows');
indices = accumarray(iA, (1:numel(iA)).', [], @(r){sort(r)});
The result is clear: @Benvaulter, please unaccept my answer and select Guillaume's, and of course use it also to save time and energy.

请先登录,再进行评论。

更多回答(1 个)

Jan
Jan 2017-3-22
编辑:Jan 2017-3-23
A = [ 1 2 3; ...
2 2 3; ...
3 2 1; ...
1 2 3; ...
3 2 1];
[B, iB, iA] = unique(A, 'rows');
G = unique(iA);
numOccurrences = splitapply(@sum, iA, G);
I cannot test a method to obtain the indices list as wanted. I assume this works with splitapply also. A simple loop approach at least:
n = length(G);
indices = cell(1, n);
for k = 1:n
indices{k} = find(iA == G(k));
end
[EDITED] Code is tested now. Use the much faster solution of Guillaume for productive work.

类别

Help CenterFile Exchange 中查找有关 Matrix Indexing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by