Fastest way to index large arrays

31 次查看(过去 30 天)
I have two sets of arrays, A and B. The "A" arrays have about 1 million elements. The "B" arrays have about 65 thousand elements. For every element in A I need to find the corresponding element in B and pull a related value. Here's a crude minimal working example
PhiA = round(359*rand(1,1e6));
ThetaA = round(179*rand(1,1e6));
PhiB = repmat(0:359,1,180);
ThetaB = reshape(repmat(0:179,360,1),1,[]);
VarB = 1:180*360;
out = nan(1e6,1);
tic
for loop = 1:length(PhiA)
idx = PhiA(loop) == PhiB & ThetaA(loop) == ThetaB;
out(loop) = VarB(idx);
end
toc
Given the size of the arrays this is not very fast, over 40 seconds on my machine. The profiler tells me that those two lines in the for loop are the slowest in my code, and surprisingly they split the burden almost exactly 50/50.
This is actually my already faster version: originally A and B were tables and the profiler told me that the slow operations were accessing and storing into the tables. Switching to arrays has sped up things a little but not as much as I hoped.
How could I make this faster?

采纳的回答

dpb
dpb 2022-10-4
With the lookup arrays structured as they are, you don't need a lookup at all; you can just calculate the row directly --
fnRow=@(phi,theta)phi+360*theta+1;
so, with this,
PhiA = round(359*rand(1,1e6));
ThetaA = round(179*rand(1,1e6));
PhiB = repmat(0:359,1,180);
ThetaB = reshape(repmat(0:179,360,1),1,[]);
VarB = 1:180*360;
tic
out=VarB(fnRow(PhiA,ThetaA));
toc
Elapsed time is 0.012699 seconds.
  4 个评论
Vittorio Picco
Vittorio Picco 2022-10-5
Yeah, it worked out. I can round to make integers so that's not a problem. The problem was that the array A has occasionally NaN, which are entries I need to skip, but that made the last line fail. The way I dealt with it was by appending a dummy value to the end of the VarB array, and by replacing the NaN with this new index; that made the out= assigment work. Then I replaced the dummy entries back with NaNs. All of that could be done without for loops so my execution time remained almost unaffected. I wonder how you would have dealt with it. I'm not good at anonymous functions so I never think about them.
dpb
dpb 2022-10-5
I probably would have simply used logical addressing in the calculation selection...
isOK=isfinite(all(A,2));
out=VarB(fnRow(PhiA(isOK),ThetaA(isOK)));
The above assumes the A array is the one of interest and checks that there are no missing lines.
If out must be the same size as A in the row dimension, then you would need to preallocate it to ensure it is that size; otherwise it will be only as large as the last non-missing element in A location. It only matters it the last N elements are those missing, but you may not have any way to know that isn't going to be the case so defensive coding would preallocate.
If the above is more like the way the code is constructed, then
isOK=isfinite(all([PhiA.' ThetaA.'],2));
looks ominous but will be fast and is easier to write than the two conditions on each vector with &

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Matrix Indexing 的更多信息

产品


版本

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by