How to speed up pairwise difference calculation between vectors under certain condition?

8 次查看(过去 30 天)
I have two double arrays A (size: Nx1) and B (Mx1) both with large increasing values (from ~1e9 to ~1e13; not sure this is relevant).
I am trying to compute the pairwise differences between the two vectors but only for pairs that respect a certain condition, that is the absolute value of the difference between the two values is smaller than t.
I first tried to brute force calculate all the pairwise differences but as you would expect a NxM array cannot be stored on a regular computer; it is around a few hundred of terabytes.
So far I have implemented a rather straightforward solution which works fine with small N and M.
Low = 1e9; % lower bound for values in A & B
Upp = 1e13; % upper bound for values in A & B
N = 1e4; % number of elements in A and B (assuming they are equal, which is not the case)
A = sort((Upp-Low)*rand(N,1)+Low,'ascend');
B = sort((Upp-Low)*rand(N,1)+Low,'ascend');
maxdif = 1000000; % maximum difference
PrWD = cell(1,N);
tic
for i = 1:N
% pairwise difference if |B-A(i)|<=maxdif
PrWD{i} = A(i) - B(abs(B-A(i))<=maxdif)';
end
toc
However, I have A and B arrays which are both very large (N and M is the order of a few tens of millions) such that this operation takes a few hours in practice.
I tried arrayfun or bsxfun on small subsets (N=M=1e5) but it does not speed up the process.
At this point, I understand that this is extremely demanding as calculating all the pairwise differences or even the indices verifying the condition seem to scale with N*M.
Is there any (Matlab) way to speed up this process? Maybe taking into acccount the fact that values are increasing, or that not all differences need to be calculated?
  1 个评论
Jan
Jan 2019-6-5
编辑:Jan 2019-6-5
You can improve the abs(B-A(i))<=maxdif part by locating the lower and upper indices iteratively. I will provide some code soon.
Dummy comment just to let me find this thread again: Jans magic message

请先登录,再进行评论。

采纳的回答

Jan
Jan 2019-6-8
Instead of creating the huge index matrix abs(B-A(i))<=maxdif in each iteration, it is cheaper to search the lower and upper limits of the matching elements iteratively:
tic
PrWD2 = cell(1,N);
ini = 1;
fin = ini;
for i = 1:N
% Find initial and final index:
while B(ini) < A(i) - maxdif && ini < N
ini = ini + 1;
end
while B(fin) <= A(i) + maxdif && fin < N
fin = fin + 1;
end
if ini <= fin - 1
PrWD2{i} = A(i) - B(ini:fin - 1).';
end
end
toc
% PrWD2(cellfun('isempty', PrWD2)) = {zeros(1,0)};
% isequal(PrWD, PrWD2)
With your example and N=1e5 this is about 1100 times faster than the original code.
This code replies the empty matrix [], if no element is matching, while the original code creates a [1 x 0] matrix. For a comparison of the results, the empty matrices are adjusted, but this is not needed in the productive code most likely.

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Matrix Indexing 的更多信息

产品


版本

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by