Speeding up code which integrates a 2D gpuArray matrix using Simpson's Rule
1 次查看(过去 30 天)
显示 更早的评论
I have a (real) 2D gpuArray, which I am using as part of a larger code, and now am trying to also integrate the array using the Composite Simpson Rule inside my main loop (several 10000 iterations at least). A MWE looks like the following:
%%%%%%%%%%%%%%%%%% MAIN CODE %%%%%%%%%%%%%%%%%%
Ny = 501; % Dimensions of matrix M
Nx = 501; %
dx = 0.1; % Grid spacings
dy = 0.2; %
M = rand(Ny, Nx, 'gpuArray'); % Initialise a matrix
for k = 1:10000
% M = function1(M) % Apply some other functions to M
% ... etc ...
I = simpsons_integration_2D(M, dx, dy, Nx, Ny); % Now integrate M
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%% INTEGRATOR %%%%%%%%%%%%%%%%%%
function I = simpsons_integration_2D(F, dx, dy, Nx, Ny)
% Integrate the 2D function F with Nx columns and Ny rows, and grid spacings
% dx and dy using Simpson's rule.
% Integrate along x direction (vertically) --> IX is a vector afterwards
sX = sum( F(:,1:2:Nx-2) + 4*F(:,2:2:(Nx-1)) + F(:,3:2:Nx) , 2);
IX = dx/3 * sX;
% Integrate along y direction --> I is a scalar afterwards
sY = sum( IX(1:2:Ny-2) + 4*IX(2:2:(Ny-1)) + IX(3:2:Ny) , 1);
I = dy/3 * sY;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The operation of performing the integration is around 850 µs, which is currently a significant part of my code. This was measured using
f = @() simpsons_integration_2D(M, dx, dy, Nx, Ny);
t = gputimeit(f)
Is there a way to reduce the execution time for integrating the gpuArray matrix?
(The graphics card is the Nvidia Quadro P4000)
采纳的回答
Jan
2021-2-10
编辑:Jan
2021-2-10
These are faster at least on CPU - I do not have graphics hardware in my laptop which supports GPU arrays:
function I = simpsons_integration_2D_v2(F, dx, dy, Nx, Ny)
% Avoid summation of large intermediate arrays:
sX = F(:,1) + 2 * sum(F(:, 3:2:(Nx-2)), 2) + ...
4 * sum(F(:, 2:2:(Nx-1)), 2) + F(:,Nx);
sY = sX(1) + 2 * sum(sX(3:2:(Ny-2))) + ...
4 * sum(sX(2:2:(Ny-1))) + sX(Ny);
I = dx * dy / 9 * sY;
end
function I = simpsons_integration_2D_v3(F, dx, dy, Nx, Ny)
% Perform summation as dot product:
v = [1, repmat([4, 2], 1, (Nx - 3) / 2), 4, 1];
sX = F * v';
if Nx ~= Ny
v = [1, repmat([4, 2], 1, (Ny - 3) / 2), 4, 1];
end
sY = v * sX;
I = dx * dy / 9 * sY;
end
Compared timeit output on my mobil i5:
0.00111591015 original
0.00066621015 without large intermediate array
0.00012221015 as matrix multiplication
So this is faster on my old 2 core i5 than on your GPU. Maybe this gives a speedup on your GPU also. I assume, in version v3 the vector v must be created on the GPU also.
3 个评论
Jan
2021-2-11
编辑:Jan
2021-2-11
On the CPU it is slightly faster to perform it in one step:
s = vY * F * vX.';
I = s * dx * dy / 9;
But this might be an the effect of the JIT acceleration.
Another approach, which is 50% slower than the v3 version on the CPU:
vX = [1, repmat([4, 2], 1, (Nx - 3) / 2), 4, 1].';
vY = [1, repmat([4, 2], 1, (Ny - 3) / 2), 4, 1];
vXY = vY.' * vX.';
function I = simpsons_integration_2D_v5(F, dx, dy, vXY)
s = F(:).' * vXY(:);
I = s * dx * dy / 9;
end
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 GPU Computing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!