Improving the speed of a double for loop through vectorization or changing summation indices
2 次查看(过去 30 天)
显示 更早的评论
I'm writing a double for loop that computes a Pade summation subject to specific coefficients and parameters. I'm curious if there is a more efficient way to write my double for loop. I think that I should avoid writing both cos(theta)^(p-q) and sin(theta)^q as the power operation is expensive.
Here is my current implementation (and three other implementations that are slightly faster).
clc; clear all;
% Test data for function
epsilon = 0.3;
delta = 0.1;
N = 500;
M = 500;
% My code passes through complex coefficients of size (N+1,M+1)
c = rand(N+1,M+1) + 1i*rand(N+1,M+1);
rho = sqrt(epsilon^2 + delta^2);
theta = atan2(delta,epsilon);
% Form the ctilde
N_M_min = min(N,M);
coeff = zeros(N_M_min+1,1);
coeff2 = zeros(N_M_min+1,1);
coeff3 = zeros(N_M_min+1,1);
coeff4 = zeros(N_M_min+1,1);
ntests = 100; % number of test iterations to average exec time
% Method 1 - Original Implementation
tic
for N = 1:ntests
for p=0:N_M_min
for q=0:p
coeff(p+1) = coeff(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*sin(theta)^q;
end
end
end
toc/ntests
% Method 2
tic
for N = 1:ntests
for p=0:N_M_min
c2 = 1;
for q=0:p
if (q == 0)
c2 = 1;
else
c2 = c2*sin(theta);
end
coeff2(p+1) = coeff2(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*c2;
end
end
end
toc/ntests
% Method 3
tic
c1 = cos(theta).^(0:N_M_min);
c2 = sin(theta).^(0:N_M_min);
for N = 1:ntests
for p=0:N_M_min
for q=0:p
coeff3(p+1) = coeff3(p+1) + c(p-q+1,q+1)*c1(p-q+1)*c2(q+1);
end
end
end
toc/ntests
% Method 4 - Fastest but slightly less accurate than method 3
tic
for N = 1:ntests
vander = (cos(theta).^(0:N_M_min).') .* (sin(theta).^(0:N_M_min));
for p=0:N_M_min
for q=0:p
coeff4(p+1) = coeff4(p+1) + c(p-q+1,q+1)*vander(p-q+1,q+1);
end
end
end
toc/ntests
diff = norm(coeff-coeff2,inf); % is tiny, 3.3e-21
diff2 = norm(coeff-coeff3,inf); % is zero
diff3 = norm(coeff-coeff4,inf); % is tiny 3.5e-15 (not zero as in method 3)
My second implementation runs almost twice as fast locally. The third and fourth implementation increase the speed by around a factor of 4.
The average executive time for method 1 is 0.0589 seconds.
The average executive time for method 2 is 0.0392 seconds.
The average executive time for method 3 is 0.0186 seconds.
The average executive time for method 4 is 0.0116 seconds.
However, I don't know how to speed up the calculation for the coefficients cos(theta)^(p-q). Is there a more efficient way of writing this double for loop through avoiding the power operation or using vectorization (my Matlab code will call this subroutine more than one hundred thousand times)?
0 个评论
采纳的回答
David Hill
2021-7-15
编辑:David Hill
2021-7-15
What is rep doing? Seems to just be multiplying.
for rep = 1:10000
for p=0:N_M_min
for q=0:p
coeff(p+1) = coeff(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*sin(theta)^q;%there is no rep in the equations
end
end
end
I believe all your nested loops can be consolidated as such:
epsilon = 0.3;
delta = 0.1;
N = 50;
M = 50;
c = rand(N+1,M+1) + 1i*rand(N+1,M+1);
rho = sqrt(epsilon^2 + delta^2);
theta = atan2(delta,epsilon);
N_M_min = min(N,M);
tic;
c=rot90(c);
coeff=10000*arrayfun(@(x)sum(diag(c,x-N_M_min)'.*cos(theta).^(x:-1:0).*sin(theta).^(0:x)),0:N_M_min);
toc;
This runs in about 0.003 s.
更多回答(0 个)
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!