Why is pagemtimes slower than just coding up the matrix multiplication?
34 次查看(过去 30 天)
显示 更早的评论
I noticed that pagemtimes is slower than just expanding an equation and coding it on both CPU and GPU. But for GPU it is exceptionally slow.
Here is an example. This can be coded up two different ways. Notice the performance of pagemtimes with just the CPU.
clear all
Nx = 100;
Ny = 100;
Nz = 100;
[A1,A2,A3,B1,B2,B3,C1,C2,C3,...
E11,E12,E13,E21,E22,E23,E31,E32,E33,...
F11,F12,F13,F21,F22,F23,F31,F32,F33] = deal(rand(Nx,Ny,Nz));
tic
for i = 1:20
%% Electric Field Update
C1 = F11.*(A1.*E11+B1)+F12.*(A2.*E12+B2)+F13.*(A3.*E13+B3);
C2 = F21.*(A2.*E21+B2)+F22.*(A2.*E22+B2)+F23.*(A3.*E23+B3);
C3 = F31.*(A3.*E31+B3)+F32.*(A3.*E32+B3)+F33.*(A3.*E33+B3);
end
toc
[A,B,C] = deal(rand(3,1,Nx,Ny,Nz));
[E,F] = deal(rand(3,3,Nx,Ny,Nz));
tic
for i = 1:20
C = pagemtimes(F,(B+pagemtimes(E,A)));
end
toc
Without pagemtimes - "Elapsed time is 0.032141 seconds".
With pagemtimes - "Elapsed time is 0.325006 seconds"
Using gpuArray() on the variables in the deal() function the the difference in times are even slower!
Without pagemtimes - "Elapsed time is 0.012688 seconds."
With pagemtimes - "Elapsed time is 5.357220 seconds."
Why is this functions so slow?
1 个评论
Rik
2021-5-31
You're using different data and inputs for both strategies. Can you fix the bugs in my code below? You can only compare the times if you use the same size inputs. Otherwise you could do most pre-processing before you start your timer.
You should also be aware that tic,toc is only valid for longer times and should only be used for a first order estimate. If you want to truly compare performance you need the timeit function.
Nx = 100;
Ny = 100;
Nz = 100;
[A,B,C] = deal(rand(3,1,Nx,Ny,Nz));
[E,F] = deal(rand(3,3,Nx,Ny,Nz));
x1=direct(A,B,E,F);
x2=pagemtimes_version(A,B,E,F);
x=abs(x1-x2);
max(x(:)) %This should be very close to 0
timeit(@()direct(A,B,E,F))
timeit(@()pagemtimes_version(A,B,E,F))
function C=direct(A,B,E,F)
%% Electric Field Update
%This version is probably incorrect
C(1,1,:,:,:) = ...
F(1,1,:,:,:).*(A(1,1,:,:,:).*E(1,1,:,:,:)+B(1,1,:,:,:)) +...
F(1,2,:,:,:).*(A(2,1,:,:,:).*E(1,2,:,:,:)+B(2,1,:,:,:)) +...
F(1,3,:,:,:).*(A(3,1,:,:,:).*E(1,3,:,:,:)+B(3,1,:,:,:));
C(2,1,:,:,:) = ...
F(2,1,:,:,:).*(A(2,1,:,:,:).*E(2,1,:,:,:)+B(2,1,:,:,:)) +...
F(2,2,:,:,:).*(A(2,1,:,:,:).*E(2,2,:,:,:)+B(2,1,:,:,:)) +...
F(2,3,:,:,:).*(A(3,1,:,:,:).*E(2,3,:,:,:)+B(3,1,:,:,:));
C(3,1,:,:,:) = ...
F(3,1,:,:,:).*(A(3,1,:,:,:).*E(3,1,:,:,:)+B(3,1,:,:,:)) +...
F(3,2,:,:,:).*(A(3,1,:,:,:).*E(3,2,:,:,:)+B(3,1,:,:,:)) +...
F(3,3,:,:,:).*(A(3,1,:,:,:).*E(3,3,:,:,:)+B(3,1,:,:,:));
end
function C=pagemtimes_version(A,B,E,F)
C = pagemtimes(F,(B+pagemtimes(E,A)));
end
回答(1 个)
Sulaymon Eshkabilov
2021-5-29
Hi,
There is no need to perform this loop calcs that is just a repeatition:
tic
%for i = 1:20
%% Electric Field Update
C1 = F11.*(A1.*E11+B1)+F12.*(A2.*E12+B2)+F13.*(A3.*E13+B3);
C2 = F21.*(A2.*E21+B2)+F22.*(A2.*E22+B2)+F23.*(A3.*E23+B3);
C3 = F31.*(A3.*E31+B3)+F32.*(A3.*E32+B3)+F33.*(A3.*E33+B3);
% end
toc
Good luck.
2 个评论
DGM
2021-5-29
I think the point of the loop is to sample multiple passes so that the average execution time can be more clearly observed
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Fourier Analysis and Filtering 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!