Extremely slow nested for loop

Question

0 个投票

Hi,

I have number of functions that I use to find roots to Neumann problem. One of the functions contains nested for loops which runs forever.

Please see below:

The function consists of three nested for loops as:

function [Lsn] = coefFourier_ll_log(k,ll)
N=length(ll);
ll_n=coefFourier1(k*ll);
lambda=zeros(1,N-1); lambda(N/2+1:N-1)=-0.5./(1:N/2-1);
lambda(1:N/2-1)=fliplr(lambda(N/2+1:N-1)); 
M=N/2-1; Lsn=zeros(N-1,N-1);
for s=-M:M
   for n=-M:M
       rmin=max(-M,max(-M-n,s-M)); 
       rmax=min(s+M,min(M-n,M));
       for r=rmin:rmax
          Lsn(s+M+1,n+M+1)=Lsn(s+M+1,n+M+1)+ll_n(s-r+M+1)*ll_n(n+r+M+1)*lambda(r+M+1);
       end
   end
end
Lsn=0.5*Lsn;

This function receives k(double), ll(1D array) and returns Lsn(Matrix). For the given time profile, the system size is 512, which means M is 255.

How I can vectorize it to increase its speed? I need to run this group of functions for couple of hundreds times and eventually with even bigger system size.

Thank you for your time.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Jan 2018-3-21

在 MATLAB Online 中打开

0 个投票

Start with a slightly cleaned version:

function Lsn = coefFourier_ll_log(k,ll)
N      = length(ll);
ll_n   = coefFourier1(k*ll);
lambda = zeros(1, N-1);
lambda(N/2+1:N-1)  = -0.5 ./ (1:N/2-1);
lambda(N/2-1:-1:1) = lambda(N/2+1:N-1);  % Instead of FLIPLR
M   = N/2-1;
M1  = M + 1;
Lsn = zeros(N-1, N-1);
for s = -M:M
   for n = -M:M
       rmin = max(-M, max(-M-n, s-M)); 
       rmax = min(s+M, min(M-n, M));
       Lsn(s+M1, n+M1) = Lsn(s+M1, n+M1) + ...
                         sum(ll_n(s+M1-rmin:s+M1-rmax) .* ...
                             ll_n(n+M1+rmin:n+M1+rmax) .* ...
                             lambda(M1+rmin:M1+rmax));
   end
end
Lsn = 0.5 * Lsn;

Here the inner loop from rmin:rmax way vectorized. Please post the timing before and after the changes measure by tic/toc. The profiler disables the JIT acceleration, such that it is less useful. Providing some test input of a relevant size (e.g. created by rand) would allow us to run the tests by our own.

5 个评论
显示 3更早的评论隐藏 3更早的评论

ttopal 2018-3-21

在 MATLAB Online 中打开

Thank you for your effort Jan. I have deleted the function unrelated to the for-loop issue, created a random vector of complex numbers for testing purposes.

Unfortunately, I am getting matrix dimensions must agree error.

Here is the code that I run:

N=256;
ll_n=rand(1,N) + 1i*rand(1,N);
tic
lambda=zeros(1,N-1); lambda(N/2+1:N-1)=-0.5./(1:N/2-1);
lambda(1:N/2-1)=fliplr(lambda(N/2+1:N-1)); 
M=N/2-1; Lsn=zeros(N-1,N-1);
for s=-M:M
   for n=-M:M
       rmin=max(-M,max(-M-n,s-M)); 
       rmax=min(s+M,min(M-n,M));
       for r=rmin:rmax
          Lsn(s+M+1,n+M+1)=Lsn(s+M+1,n+M+1)+ll_n(s-r+M+1)*ll_n(n+r+M+1)*lambda(r+M+1);
       end
   end
end
toc
tic
lambda = zeros(1, N-1);
lambda(N/2+1:N-1)  = -0.5 ./ (1:N/2-1);
lambda(N/2-1:-1:1) = lambda(N/2+1:N-1);  % Instead of FLIPLR
M   = N/2-1;
M1  = M + 1;
Lsn = zeros(N-1, N-1);
for s = -M:M
 for n = -M:M
     rmin = max(-M, max(-M-n, s-M)); 
     rmax = min(s+M, min(M-n, M));
     Lsn(s+M1, n+M1) = Lsn(s+M1, n+M1) + ...
                       sum(ll_n(s+M1-rmin:s+M1-rmax) .* ...
                           ll_n(n+M1+rmin:n+M1+rmax) .* ...
                           lambda(M1+rmin:M1+rmax));
 end
end
toc

ttopal 2018-3-22

在 MATLAB Online 中打开

Thank you Jan. Do you think it is possible to use Ngrid for the first two loops? Actually, the first two matrices have sort of repeating form. For example;

M = 1  
rmin =
       0    -1    -1
       0    -1    -1
       0     0     0
rmax =  
       0     0     0
       1     1     0
       1     1     0
M=2
rmin =
     0    -1    -2    -2    -2
     0    -1    -2    -2    -2
     0    -1    -2    -2    -2
     0    -1    -1    -1    -1
     0     0     0     0     0
rmax =
     0     0     0     0     0
     1     1     1     1     0
     2     2     2     1     0
     2     2     2     1     0
     2     2     2     1     0

Jan 2018-3-22

While rmin and rmax have a specific structure, which could be exploited, the actual indices are e.g. s-rmin+M1 and change with each iteration. Therefore I do not see a way to use the pattern in the outer loops.

It would be useful if you explain, what the operation does. Maybe it is much faster to calculate it by conv or conv2.

请先登录，再进行评论。

Extremely slow nested for loop

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

5 个评论
显示 3更早的评论隐藏 3更早的评论

更多回答（0 个）

类别

标签

Community Treasure Hunt

Extremely slow nested for loop

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

5 个评论 显示 3更早的评论 隐藏 3更早的评论

更多回答（0 个）

类别

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

5 个评论
显示 3更早的评论隐藏 3更早的评论