speed up intersect, setdiff functions
显示 更早的评论
Hi
I have some code, which when I examine using the Matlab Profiler shows that the majority of the time is spent on the intersect and setdiff functions, which are invoked many times during a looping process. I was wondering if there was a clever way to speed this up? For example is it possible to somehow vectorize the following code to avoid invoking the inbuilt set functions? Thanks for your help!
tic;
size=50000;
a=round(100*rand(size,1));
b=round(100*rand(size,1));
intersect(a,b);
setdiff(a,b);
toc;
1 个评论
Matt Fig
2012-11-15
Your best bet might be to avoid performing these operations in a loop in the first place...
采纳的回答
更多回答(2 个)
Jonathan Epperl
2012-11-15
If your sets aren't much larger than your example, and are indeed made up of positive integers, then those two functions here can get you a drastic increase in speed. The idea isn't mine, afaik Kevin Murphy of the Bayes Net TB came up with it. Anyway, 2 functions which pretty much exactly replace setdiff and intersect (as long as you work with positive integers):
function C = MY_intersect(A,B)
if ~isempty(A)&&~isempty(B)
P = zeros(1, max(max(A),max(B)) ) ;
P(A) = 1;
C = B(logical(P(B)));
else
C = [];
end
and
function Z = MY_setdiff(X,Y)
if ~isempty(X)&&~isempty(Y)
check = false(1, max(max(X), max(Y)));
check(X) = true;
check(Y) = false;
Z = X(check(X));
else
Z = X;
end
Obviously, if you're confident your inputs will make sense you can get rid of the if stuff.
tic;
size=50000;
a=round(100*rand(size,1));
b=round(100*rand(size,1));
intersect(a,b);
setdiff(a,b);
toc;
% Elapsed time is 0.028182 seconds.
tic;
size=50000;
a=round(100*rand(size,1));
b=round(100*rand(size,1));
MY_intersect(1+a,1+b)-1;
MY_setdiff(1+a,1+b)-1;
toc;
% Elapsed time is 0.016935 seconds.
The computation times vary a lot though, but the 2nd block is always at least around 30% faster. You can see how it plays out in your code and let us know.
3 个评论
Unfortunately, MY_setdiff does not quite behave the same as setdiff, also not for positive integers. Consider: setdiff([1 2 2 1], [1]) = 2 and MY_setdiff([1 2 2 1], [1]) = 2 2. Hence, you would have to include a Z = unique(Z) which reduces speed. However, MY_setdiff generally still seems to be faster for pos. integers
Jonathan Epperl
2014-8-18
That's a good point, I never realized that.
Chinmaya KA
2020-9-9
Hi Jonathan, how can we intersect multiple arrays with positive integers using your code?
Sean de Wolski
2012-11-15
0 个投票
Look at ismember and ismembc (two output form).
You will have to implement the and/or/xor logic yourself, but these should quickly give you the indices you need to do so.
1 个评论
Ethan Kyzivat
2018-5-9
I found that ismember also calls unique(), so may not be much faster...
类别
在 帮助中心 和 File Exchange 中查找有关 Logical 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!