Fast method for findings which rows of a matrix are contained in another

1 次查看(过去 30 天)
I have two matrices, A and B. The rows of B are a subset of the rows of A, and I want to find the row indices of the rows in A that correpsond to rows in B. I can do this using the intersect or ismember functions, but these are both much too slow for my purposes. I need to perform this calculation hundreds of thousands of times.
A is typically varies from a matrix, and can go up to having several hundred rows (still with 8 columns). B also has 8 columns, and varies from 1 row up to the number of rows in A. Are there any other methods I can use to find the rows of A that are also in B?
Alternatively, I perform a set of operations to the matrix A to come up with the matrix B. Is there a way I can use this to my advantage to find the rows of the final matrix (which would be B) relative to the original matrix A.

采纳的回答

Stephen23
Stephen23 2020-8-27
编辑:Stephen23 2020-8-27
>> A = [1,2,3,4;5,6,7,8;9,10,11,12]
A =
1 2 3 4
5 6 7 8
9 10 11 12
>> B = [5,6,7,8;9,10,11,12]
B =
5 6 7 8
9 10 11 12
Using permute and test for equality (for versions >=R2016b bsxfun is not required):
>> X = any(all(bsxfun(@eq,A,permute(B,[3,2,1])),2),3)
X =
0
1
1
>> find(X) % optional
ans =
2
3

更多回答(1 个)

Bruno Luong
Bruno Luong 2020-8-27
编辑:Bruno Luong 2020-8-27
Not know how much my code is faster than ISMEMBER (which IMO pretty fast) (EDIT: it's indeed > 3 time faster)
% Generate random A, B such that rows of B are subset of that of A
A = randi(10,1000,8);
B = randi(10,1000,8);
A = [A;B];
A = A(randperm(end),:);
tic
% iA contains rows in A that matches with B
[~,is] = sortrows([A;B]);
isB = is>size(A,1);
ismatch = ~isB(1:end-1)&isB(2:end);
iA = is(ismatch); % row index of A that match B
%iB = is([false;ismatch])-size(A,1); % matching index
%isequal(A(iA,:),B(iB,:)) % meaning isequal(A(iA,:),B(iB,:)) is TRUE
toc % Elapsed time is 0.000656 seconds.
Do you have something unchanged during the loop? Is there any special properties of A and B (sorted already)? If yes it might exist faster methods using those charracteristics.
  2 个评论
Alexander Holmes
Alexander Holmes 2020-8-27
编辑:Alexander Holmes 2020-8-27
Ismember probably is quite fast, it just wasn't fast enough for me the way I had written my code.
That said, I was being silly and could apply the same operations I used to get B to an array of indices without then having to use ismember at all. Thank you for your answer though!
Bruno Luong
Bruno Luong 2020-8-27
编辑:Bruno Luong 2020-8-27
My code is about 3.8 times faster than ismember for 2000/1000 rows
% Generate random A, B such that rows of B are subset of that of A
A = randi(10,1000,8);
B = randi(10,1000,8);
A = [A;B];
A = A(randperm(end),:);
ntries = 1000;
tic
for k=1:ntries
% iA contains rows in A that matches with B
[~,is] = sortrows([A;B]);
isB = is>size(A,1);
ismatch = ~isB(1:end-1)&isB(2:end);
iA = is(ismatch); % row index of A that match B
%iB = is([false;ismatch])-size(A,1); % matching index
%isequal(A(iA,:),B(iB,:)) % meaning isequal(A(iA,:),B(iB,:)) is TRUE
end
tsort=toc; % Elapsed time is 0.000656 seconds.
tic
for k=1:ntries
tf = ismember(A, B, 'rows');
iA = find(tf);
end
tismember=toc;
tic
for k=1:ntries
X = any(all(bsxfun(@eq,A,permute(B,[3,2,1])),2),3);
end
teq=toc;
fprintf('tsort = %f [s]\n', tsort);
fprintf('tismember = %f [s]\n', tismember);
fprintf('teq = %f [s]\n', teq);
% fprintf('tsort/tismember = %f\n', tsort/tismember);
% fprintf('tismember/tsort = %f\n', tismember/tsort);
Result for 2000/1000 rows
tsort = 0.252900 [s]
tismember = 0.934710 [s]
teq = 12.506212 [s]
Result of 12/6 rows
%A = randi(6,10,8);
%B = randi(6,10,8);
%A = [A;B];
%A = A(randperm(end),:);
tsort = 0.009770 [s]
tismember = 0.126905 [s]
teq = 0.006746 [s]

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Logical 的更多信息

标签

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by