Improving the speed of a double for loop through vectorization or changing summation indices

Question

Matthew Kehoe 2021-7-15

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/879613-improving-the-speed-of-a-double-for-loop-through-vectorization-or-changing-summation-indices

编辑： Matthew Kehoe 2021-7-22

I'm writing a double for loop that computes a Pade summation subject to specific coefficients and parameters. I'm curious if there is a more efficient way to write my double for loop. I think that I should avoid writing both cos(theta)^(p-q) and sin(theta)^q as the power operation is expensive.

Here is my current implementation (and three other implementations that are slightly faster).

clc; clear all;
% Test data for function
epsilon = 0.3;
delta = 0.1;
N = 500;
M = 500;
% My code passes through complex coefficients of size (N+1,M+1)
c = rand(N+1,M+1) + 1i*rand(N+1,M+1);
rho = sqrt(epsilon^2 + delta^2);
theta = atan2(delta,epsilon);
% Form the ctilde
N_M_min = min(N,M);
coeff = zeros(N_M_min+1,1);
coeff2 = zeros(N_M_min+1,1);
coeff3 = zeros(N_M_min+1,1);
coeff4 = zeros(N_M_min+1,1);
ntests = 100; % number of test iterations to average exec time
% Method 1 - Original Implementation
tic
for N = 1:ntests
  for p=0:N_M_min
    for q=0:p
      coeff(p+1) = coeff(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*sin(theta)^q;
    end
  end
end
toc/ntests
% Method 2
tic
for N = 1:ntests
  for p=0:N_M_min
    c2 = 1;
    for q=0:p
      if (q == 0)
        c2 = 1;
      else
        c2 = c2*sin(theta);
      end
      coeff2(p+1) = coeff2(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*c2;
    end
  end
end
toc/ntests
% Method 3
tic
c1 = cos(theta).^(0:N_M_min);
c2 = sin(theta).^(0:N_M_min);
for N = 1:ntests
  for p=0:N_M_min
    for q=0:p
      coeff3(p+1) = coeff3(p+1) + c(p-q+1,q+1)*c1(p-q+1)*c2(q+1);
    end
  end
end
toc/ntests
% Method 4 - Fastest but slightly less accurate than method 3
tic
for N = 1:ntests
  vander = (cos(theta).^(0:N_M_min).') .* (sin(theta).^(0:N_M_min));
  for p=0:N_M_min
    for q=0:p
      coeff4(p+1) = coeff4(p+1) + c(p-q+1,q+1)*vander(p-q+1,q+1);
    end
  end
end
toc/ntests
diff = norm(coeff-coeff2,inf); % is tiny, 3.3e-21
diff2 = norm(coeff-coeff3,inf); % is zero
diff3 = norm(coeff-coeff4,inf); % is tiny 3.5e-15 (not zero as in method 3)

My second implementation runs almost twice as fast locally. The third and fourth implementation increase the speed by around a factor of 4.

The average executive time for method 1 is 0.0589 seconds.
The average executive time for method 2 is 0.0392 seconds.
The average executive time for method 3 is 0.0186 seconds.
The average executive time for method 4 is 0.0116 seconds.

However, I don't know how to speed up the calculation for the coefficients cos(theta)^(p-q). Is there a more efficient way of writing this double for loop through avoiding the power operation or using vectorization (my Matlab code will call this subroutine more than one hundred thousand times)?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

David Hill 2021-7-15

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/879613-improving-the-speed-of-a-double-for-loop-through-vectorization-or-changing-summation-indices#answer_747783

编辑：David Hill 2021-7-15

在 MATLAB Online 中打开

What is rep doing? Seems to just be multiplying.

for rep = 1:10000
  for p=0:N_M_min
    for q=0:p
      coeff(p+1) = coeff(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*sin(theta)^q;%there is no rep in the equations
    end
  end
end

I believe all your nested loops can be consolidated as such:

epsilon = 0.3;
delta = 0.1;
N = 50;
M = 50;
c = rand(N+1,M+1) + 1i*rand(N+1,M+1);
rho = sqrt(epsilon^2 + delta^2);
theta = atan2(delta,epsilon);
N_M_min = min(N,M);
tic;
c=rot90(c);
coeff=10000*arrayfun(@(x)sum(diag(c,x-N_M_min)'.*cos(theta).^(x:-1:0).*sin(theta).^(0:x)),0:N_M_min);
toc;

This runs in about 0.003 s.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Matthew Kehoe 2021-7-15

编辑：Matthew Kehoe 2021-7-22

在 MATLAB Online 中打开

I only included rep for testing purposes (I shouldn't have included it and have edited my original post). The 51 coefficient values are different as

clear all;
clc;
epsilon = 0.3;
delta = 0.1;
N = 50;
M = 50;
c = rand(N+1,M+1) + 1i*rand(N+1,M+1);
rho = sqrt(epsilon^2 + delta^2);
theta = atan2(delta,epsilon);
N_M_min = min(N,M);
coeff = zeros(N_M_min+1,1);
% Method 1 - Original Matlab code
tic
  for p=0:N_M_min
    for q=0:p
      coeff(p+1) = coeff(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*sin(theta)^q;
    end
  end
toc
% Method 2
tic
c=rot90(c);
coeff2=arrayfun(@(x)sum(diag(c,x-N_M_min)'.*cos(theta).^(x:-1:0).*sin(theta).^(0:x)),0:N_M_min);
coeff2=coeff2';
toc
diff = norm(coeff - coeff2,inf); % returns 0.6371
% The average execution time for method 1 is 0.088334 seconds.
% The average execution time for method 1 is 0.305050 seconds.

It appears that calling arrayfun() is slower than writing for loops.

请先登录，再进行评论。

Improving the speed of a double for loop through vectorization or changing summation indices

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Improving the speed of a double for loop through vectorization or changing summation indices

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论