strange performance behavior - microbenchmark

Question

Michal 2022-11-30

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1867223-strange-performance-behavior-microbenchmark

编辑： Bruno Luong 2022-11-30

There are two implementations of the same algorithm:

function B = alg1( B, R, alpha )
% Fast computation of L1 distance transform
K = numel(B);
% forward pass
for k=2:K
    B(k) = min( B(k), B(k-1) + alpha * (R(k) - R(k-1)));
end
% backward pass
for k=K-1:-1:1
    B(k) = min( B(k), B(k+1) + alpha * (R(k+1) - R(k)));
end
end

and

function B = alg2( B, R, alpha )
% Fast computation of L1 distance transform
alphaRdiff = alpha*diff(R);
K = numel(B);
% forward pass
for k=2:K
    B(k) = min( B(k), B(k-1) + alphaRdiff(k-1));
end
% backward pass
for k=K-1:-1:1
    B(k) = min( B(k), B(k+1) + alphaRdiff(k));
end
end

These two algorithms differs only by elimination of term

alpha * (R(k) - R(k-1)))

from the both for-loop.

And by scaled differences pre-computation of vector R

alphaRdiff = alpha*diff(R);

So, the second algorithm alg2 should be faster, because only half of multiplications alpha*(R(k)-R(k-1)) is performed.

But speed of both algorithm is still nearly same (or unmodified alg 1 is even faster), see

N = 1e5;
B = rand(1,N);
R = rand(1,N);
tic;A = alg1( B, R, 1/3 );toc
tic;A_= alg2( B, R, 1/3 );toc
isequal(A,A_)
Elapsed time is 0.037654 seconds.
Elapsed time is 0.029562 seconds.
ans =
logical
1
N = 1e8;
tic;A = alg1( B, R, 1/3 );toc
tic;A_= alg2( B, R, 1/3 );toc
Elapsed time is 1.444549 seconds.
Elapsed time is 1.563630 seconds.

What is the explanation of this strange behavior?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Bruno Luong 2022-11-30

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1867223-strange-performance-behavior-microbenchmark#answer_1116173

编辑：Bruno Luong 2022-11-30

在 MATLAB Online 中打开

My guess is in the first code, the intermediate result does have to be stored in memory. It can be in processor register or in the first level of cache. So even the operation is performed twice it is still faster.

I made an algo_3 & 4 that replace min by if, and they seem even faster.

N = 1e8;
B = rand(1,N);
R = rand(1,N);
tic;A = alg1( B, R, 1/3 );toc % Elapsed time is 0.819863 seconds.
tic;A= alg2( B, R, 1/3 );toc  % Elapsed time is 0.852902 seconds.
tic;A= alg3( B, R, 1/3 );toc  % Elapsed time is 0.619154 seconds.
tic;A= alg4( B, R, 1/3 );toc  % Elapsed time is 0.564323 seconds.
function B = alg1( B, R, alpha )
% Fast computation of L1 distance transform
K = numel(B);
% forward pass
for k=2:K
    B(k) = min( B(k), B(k-1) + alpha * (R(k) - R(k-1)));
end
% backward pass
for k=K-1:-1:1
    B(k) = min( B(k), B(k+1) + alpha * (R(k+1) - R(k)));
end
end
function B = alg2( B, R, alpha )
% Fast computation of L1 distance transform
alphaRdiff = alpha*diff(R);
K = numel(B);
% forward pass
for k=2:K
    B(k) = min( B(k), B(k-1) + alphaRdiff(k-1));
end
% backward pass
for k=K-1:-1:1
    B(k) = min( B(k), B(k+1) + alphaRdiff(k));
end
end
function B = alg3( B, R, alpha )
% Fast computation of L1 distance transform
alphaRdiff = alpha*diff(R,1,2);
K = numel(B);
% forward pass
for k=2:K
    C = B(k-1) + alphaRdiff(k-1);
    if C < B(k)
        B(k) = C;
    end
end
% backward pass
for k=K-1:-1:1
    C = B(k+1) + alphaRdiff(k);
    if C < B(k)
        B(k) = C;
    end
end
end
function B = alg4( B, R, alpha )
K = numel(B);
% forward pass
for k=2:K
    C = B(k-1) + alpha * (R(k) - R(k-1));
    if C < B(k)
        B(k) = C;
    end
end
% backward pass
for k=K-1:-1:1
    C = B(k+1) + alpha * (R(k+1) - R(k));
    if C < B(k)
        B(k) = C;
    end
end
end

2 个评论
显示无隐藏无

Michal 2022-11-30

编辑：Michal 2022-11-30

Thanks for answer! These results significantly change the MATLAB programming best practice. Now is really hard to decide how to implement even simple algorithms.

Your best solution (alg4) eliminate all built-in functions...??!!

Bruno Luong 2022-11-30

"Your best solution (alg4) eliminate all built-in functions"

10 year ago this is not the case.

The rule of thumbs now is for-loop with basic arithmetics is fast and competitive.

请先登录，再进行评论。

strange performance behavior - microbenchmark

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

2 个评论
显示无隐藏无

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

strange performance behavior - microbenchmark

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

2 个评论 显示 无隐藏 无

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

2 个评论
显示无隐藏无