how to vectorize a for-loop or use parfor to speed it up?

Question

sasha 2014-8-10

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/147026-how-to-vectorize-a-for-loop-or-use-parfor-to-speed-it-up

评论： Jan 2014-8-10

I have this for loop right now:

     for x = 2:Nx_max-1
          for y = 2:Ny_max-1
              lp(x,y,t1) = (1/(2*h^2))*dxy(x,y)*p(x+1,y+1,t1) + ...
              (dxx(x,y)/(h^2) - vx(x,y)/(2*h) - (x-Nx-1)/(2*gamma))* p(x+1,y,t1) - ...
              (1/(2*h^2))*dxy(x,y)*p(x+1,y-1,t1) + (dyy(x,y)/(h^2) - ...
              vy(x,y)/(2*h) - (y-Ny-1)/(2*gamma))*p(x,y+1,t1) - ...
              (2*dxx(x,y) + 2*dyy(x,y))*(1/h^2)*p(x,y,t1) + ...
              (dyy(x,y)/(h^2) + vy(x,y)/(2*h) +(y-Ny-1)/(2*gamma))*p(x,y-1,t1) - ...
              (dxy(x,y)/(2*h^2))*p(x-1,y+1,t1) +(dxx(x,y)/(h^2) +vx(x,y)/(2*h) +...
              (x-Nx-1)/(2*gamma))*p(x-1,y,t1) + (dxy(x,y)/(2*h^2))*p(x-1,y-1,t1);
          end
      end

I asked a question previously where I didn't actually write out the full function here, and I was recommended to vectorize it for speed by replacing x by 2:(Nx_max-1) and y by 2:(Ny_max-1). This is causing problems because it looks like the multiplications would cause it to be matrix multiplication instead of multiplying the specific elements of the matrix that I'm interested in mylitplying together. And I also have a section where there ends up being a vector (2:(Nx_max-1)) multiplying a matrix. I don't want the end result to be a vector (as it would in regular matlab notation), I want each element of the matrix in that case to be multiplied by its x value.

Is there a way to vectorize this or somehow speed up the process?

I had thought about parallel computation, but it looks like I can't nest parfor loops.

Any suggestions would be appreciated

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Jan 2014-8-10

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/147026-how-to-vectorize-a-for-loop-or-use-parfor-to-speed-it-up#answer_148057

编辑：Jan 2014-8-10

在 MATLAB Online 中打开

An elementwise multiplication is performed by the .* operator, while * is a matrix multiplication. This should solve your problem already. Please post your trial to vectorize the code instead of letting us create it from scratch.

An optimization of code depends on the input values. It matters if Nx_max is 10 or 1e7. It matters if dxy and gamma are arrays or functions. So please post a version of your code, which can be started by copy&paste. Otherwise suggestions contains a lot of guessing.

But if guessing is required, consider the standard tips fopr loop optimization at first:

avoid repeated calculations
pre-allocate the output

E.g. (1/(2*h^2)) is calculated nearly Ny_max * Nx_max times. So better create a variable before the loop to carry the result. So befpre creating the vectorized version, measure the speed with a cleaner version like this:

hs =  h^2;
h2  = 2 * h;
hs2 = 2 * hs;
c1  = 1/hs2;
g2  = 2 * gamma;
lp = zeros(Nx_max-1, Ny_max-1, t1);
for x = 2:Nx_max-1
  c2 = (x-Nx-1)/g2;
  for y = 2:Ny_max-1
    lp(x,y,t1) = c1 * dxy(x,y) * p(x+1,y+1,t1) + ...
       (dxx(x,y) / hs - vx(x,y) / h2 - c2) * p(x+1,y,t1) - ...
       c1 * dxy(x,y) * p(x+1,y-1,t1) + ...
       (dyy(x,y)/hs - vy(x,y)/h2 - (y-Ny-1)/g2) * p(x,y+1,t1) - ...
       (2 * dxx(x,y) + 2 * dyy(x,y)) * p(x,y,t1) / hs + ...
       (dyy(x,y)/hs + vy(x,y)/h2 +(y-Ny-1)/g2) * p(x,y-1,t1) - ...
       dxy(x,y) / hs2 * p(x-1,y+1,t1) + ...
       (dxx(x,y)/hs + vx(x,y) / h2 + c2) * p(x-1,y,t1) + ...
       dxy(x,y) / hs2 * p(x-1,y-1,t1);
  end
end

Then, in a next step, replace the loop over x by replacing all "x" by "(2:Nx_max-1)" and the operators "*" and "/" by their elementwise versions ".*" and "./". Test, if the result is still not changed except for rounding errors. Fix "(2:Nx_max-1)+1" to the faster "3:Nx_max", because in the 2nd version the index vector is not calculated exlicitly.

Finally you can try to replace the inner loop also, but perform a speed test afterwards. A complete vectorization is not necessarily the fastest version.

4 个评论
显示 2更早的评论隐藏 2更早的评论

sasha 2014-8-10

编辑：sasha 2014-8-10

在 MATLAB Online 中打开

element-wise did not work because x - Ny-1 would be written as a vector of size Nx_max-2 by 1. While all other terms are Nx_max-2 by Ny_max-2. This gives an error stating that matrix dimensions do not match.

There are 4 of these problematic terms, the first of which is in the second line of the definition of my new matrix (the last term of that line).

Here are the two things I tried for that specific term (ignoring the 1/(2*gamma) because that is just a constant and is not causing any problems):

((2:(Nx_max-1)) -Nx-1) * p((2:(Nx_max-1))+1, (2:(Ny_max-1)),t1)

The problem with this isn't that it gave me an error message. What it did was give me a column vector (not what I want) my result should be a matrix of the same dimensions as p. What I want to happen is that for all the x=2 terms of p to be multiplied by (2-Nx-1), for all the x=3 terms of p to be multiplied by (3-Nx-1), for all the x=4 terms of p to be multiplied by (4-Nx-1), etc. This should be a matrix with the same dimensions as p.

Also, because it ends with a vector it can not be added to any of the other terms that end up as matrices (because they need to be matrices).

Then I tried:

x1 = 2:(Nx_max-1);
y1 = 2:(Ny_max-1);
x1 .* p(:,:,t1) - (Nx+1) * p(:,:,t1)

The second part of this term is fine. But the first part gives an error message saying that matrix dimensions need to match. As you can see, they don't because x1 is a vector and p is a matrix. Thus element wise multiplication can't happen because the number of elements do not match.

Does this make more sense now?

Note: the actual expressions for vx, vy, p, dxx, dxy, and dyy don't really matter. What is important is that they are all Nx_max by Ny_max matrices (and then I only want to deal with the bulk elements, ie none of the edges of the matrices) so I work with Nx_max-2 by Ny_max-2 values.

I know that the for loop gives me the correct answer, but it is slow. I have been testing the new attempts against the values of the for loop. Basically running both attempts at the same time (giving them different names) and then seeing if they give the same result by subtracting them from one another.

Jan 2014-8-10

在 MATLAB Online 中打开

Here a guessed version of a vectorized outer loop:

hs =  h^2;
h2  = 2 * h;
hs2 = 2 * hs;
g2  = 2 * gamma;
Nx1 = Nx_max - 1;
Nx2 = Nx_max - 2;
c2  = ((2 - Nx-1):(Nx_max - 2 - Nx)) / g2;
lp  = zeros(Nx1, Ny_max-1, t1);
for y = 2:Ny_max-1
  c3 = dxy(2:Nx1,y) ./ hs2;
  c4 = dxy(2:Nx1,y) ./ hs2;
  c5 = vy(2:Nx1,y)  ./ h2;
  c6 = vx(2:Nx1,y)  ./ h2;
  c7 = dyy(2:Nx1,y) ./ hs;
  lp(2:Nx1, y, t1) = ...
    c3 .* p(3:Nx_max,y+1,t1) + ...
    (dxx(2:Nx1,y) ./ hs - c6 - c2) .* p(3:Nx_max,y,t1) - ...
    c3 .* p(3:Nx_max,y-1,t1) + ...
    (c7 - c5 - (y-Ny-1) ./ g2)    .* p(2:Nx1,y+1,t1) - ...
    (dxx(2:Nx1,y) + dyy(2:Nx1,y)) .* p(2:Nx1,y,t1) .* (2 ./ hs) + ...
    (c7 + c5 + (y-Ny-1) ./ g2)    .* p(2:Nx1,y-1,t1) - ...
    c4 .* p(1:Nx2,y+1,t1) + ...
    (dxx(2:Nx1,y) ./ hs + c6 + c2) .* p(1:Nx2,y,t1) + ...
    c4 .* p(1:Nx2,y-1,t1);
end

Unfortunately I'm in doubt that this produces still the same results, because this kind of editing is prone to typos. But you see the strategy and can replace the repeated calculations step by step by your own and check, if teh results are not changed.

请先登录，再进行评论。

how to vectorize a for-loop or use parfor to speed it up?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

4 个评论
显示 2更早的评论隐藏 2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

how to vectorize a for-loop or use parfor to speed it up?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

4 个评论 显示 2更早的评论隐藏 2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

4 个评论
显示 2更早的评论隐藏 2更早的评论