How to efficiently index into memory and call functions within for loops
2 次查看(过去 30 天)
显示 更早的评论
Hello all, I have a code that I would like to speed up a little bit. There's one section where I input an array that has multiple dimentions (4+) and a few of these dimensions are orders of magnitude different in size. I read somewhere that I should always put the largest dimension last but from this quick test code I made that doesn't seem to be the case. The fastest situation on my computer is when a=a_cell{4} which has size (1000, 10000,10). It should also be noted for cases 5 and 6 I'm getting almost 8x slower performance vs the optimal 4th case. Is there any documentation as to why this would be the case, i.e.how matlab would pull memory/cache memory/call functions with the different dimention sizes?
a = rand(10000,1000,10);
possible_perms = perms([1 2 3]);
for n=1:size(possible_perms,1)
a_cell{n} = permute(a,possible_perms(n,:));
end
a_cell = a_cell';
b = zeros(10000);
clc
a = a_cell{1};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,:,n),'all');
end
end
toc
a = a_cell{2};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,n,:),'all');
end
end
toc
a = a_cell{3};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,:,n),'all');
end
end
toc
a = a_cell{4};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,n,:),'all');
end
end
toc
a = a_cell{5};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(n,:,:),'all');
end
end
toc
a = a_cell{6};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(n,:,:),'all');
end
end
toc
5 个评论
Bruno Luong
2023-11-29
编辑:Bruno Luong
2023-11-29
I guess it is NOT the sum, but the indexing a(n,:,:) or a(:,n,:) or a(:,:,n) that takes extra time.
The memory content of the original array a and such indexing is different (shuffled), so (I guess) MATLAB have to copy and rearrange the memory content BEFORE calling lowlevel sum to perform the iterative plus operations.
This comparison is NOT the same as if you compare purely caching as you put it in the question. What your code does is indexing AND calling the function. This is NOT nested for-loop access to array to retreive iteratively scalars.
采纳的回答
Bruno Luong
2023-11-30
编辑:Bruno Luong
2023-11-30
When you index a hybrid slide (j positive integer scalar)
% x = ...
% j = ...
xidx = x(:,:,:,j,:,:)
(here I take a 6D array as example), MATLAB probably does the following
s = size(x)
p = prod(s(1:3)) % length of the head dimensions
q = prod(s(5:6)) % length of the tail dimensions
xx = reshape(x, [p s(4) q]);
xidx = zeros(p, 1, q); % aloocate array that stores the result
for k = 1:q
% xidx(:,1,k) = xx(:,j,k);
memcpy(xidx(:,1,k), xx(:,j,k), p*sizeof(class(x))); % This is lowlevel fast memory copying, google it if you know to learn more
% copy a conyguous memory block of p numbers
end
s(4) = 1;
xidx = reshape(xidx, s);
Note that p*q = numel(xidx) = numel(x)/size(x,4), so constant under array permutation.
This is fastest when the array x is setup so that p is largest and q is smallest (including 1 value, i.e. j is at the last dimension of x).
In your case it's the for-loop with a(:,:,n) that would be the fastest.
1 个评论
Bruno Luong
2023-11-30
编辑:Bruno Luong
2023-11-30
The above rule seems to be applicable here (I assume the differences of times of sum in various loops are negligible (I also modified the array size to make the effect more pronounced):
test
function test
a = rand(10000,100,100); % Changed by Bruno, was rand(10000,1000,10), 10 is too small to see the effect
possible_perms = perms([1 2 3]);
for n=1:size(possible_perms,1)
a_cell{n} = permute(a,possible_perms(n,:));
end
a_cell = a_cell';
b = zeros(1,10000); % Bruno fix the allocation, vector NOT array
clc
a = a_cell{1};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,:,n),'all');
end
end
toc % Elapsed time is 0.904677 seconds.
a = a_cell{2};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,n,:),'all');
end
end
toc % Elapsed time is 1.212156 seconds..
a = a_cell{3};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,:,n),'all');
end
end
toc % Elapsed time is 0.871692 seconds.
a = a_cell{4};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,n,:),'all');
end
end
toc % Elapsed time is 1.132273 seconds.
a = a_cell{5};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(n,:,:),'all');
end
end
toc % Elapsed time is 6.790436 seconds.
a = a_cell{6};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(n,:,:),'all');
end
end
toc % Elapsed time is 6.875394 seconds.
% Bruno method
a = a_cell{3};
tic
a = reshape(a, [], size(a,3));
for k=1:10
b = sum(a,1);
end
toc % Elapsed time is 0.145326 seconds.
end
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Loops and Conditional Statements 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!