Computation time significantly differs for 2D, 3D array indexing when solving the same problem. Best approach to address performance?

Question

Corey Hoydic 2024-6-14

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2128871-computation-time-significantly-differs-for-2d-3d-array-indexing-when-solving-the-same-problem-best

评论： Corey Hoydic 2024-6-17

I wrote a simple code that calculates the covariance of images along different directions and lag spacings and returns a 2D map of covariance values varying by lags(h1,h2). The original program I wrote accepts only 2D inputs:

function [Correlogram] = ExhaustiveCovarianceFunction(X,lags)
[ny,nx,~]=size(X);
Covariance=NaN([2*lags+1 2*lags+1]);
Correlogram=NaN([2*lags+1 2*lags+1]);
for hy=-lags:lags
    for hx=-lags:lags
        
        sumprod=0;
        sum1=0;
        sum2=0;
        countpairs=0;
        for j=1:ny
            for i=1:nx
                if hy+j<=0 || hy+j>ny || hx+i<=0 || hx+i>nx
                    continue
                else                    
                    sumprod=X(j,i)*X(j+hy,i+hx) + sumprod;
                    sum1=X(j,i) + sum1;
                    sum2=X(j+hy,i+hx) + sum2;
                    countpairs=countpairs+1;
                end
            end
        end
        sumprod=sumprod/(countpairs);
        sum1=sum1/(countpairs);
        sum2=sum2/(countpairs);
        Covariance(hy+lags+1,hx+lags+1)=sumprod - sum1*sum2; 
        %Standardize covariance values to correlogram
        mean1=sum1;
        mean2=sum2;
        sumvar1=0;
        sumvar2=0;
            for j=1:ny
                for i=1:nx
                    if hy+j<=0 || hy+j>ny || hx+i<=0 || hx+i>nx
                        continue
    
                    else                    
                        sumvar1=(X(j,i)-mean1)^2 + sumvar1;
                        sumvar2=(X(j+hy,i+hx)-mean2)^2 + sumvar2;
                    end
                end
            end
            sumvar1=sumvar1/countpairs;
            sumvar2=sumvar2/countpairs;
            Correlogram(hy+lags+1,hx+lags+1)=Covariance(hy+lags+1,hx+lags+1)/sqrt(sumvar1*sumvar2); 
    end
end
end

I wanted to extend this code to allow for up to 3D image inputs, but also allow 2D or 1D inputs. Some other changes for optimization were made (dynamically updating bounds, removing if/else statement):

function [Correlogram] = ExhaustiveCovarianceFunction3D(X,lags)
[ny,nx,nz]=size(X);
%lags is now a vector
Covariance=NaN([2*lags(1)+1, 2*lags(2)+1, 2*lags(3)+1]);
Correlogram=NaN([2*lags(1)+1, 2*lags(2)+1, 2*lags(3)+1]);
for hz=-lags(3):lags(3)
    for hy=-lags(1):lags(1)
        for hx=-lags(2):lags(2)
            hvector = [hy hx hz];
            if hvector(1)<0
                ymin=abs(hvector(1))+1;
                ymax=ny;
            else
                ymin=1;
                ymax=ny-hvector(1);
            end
            if hvector(2)<0
                xmin=abs(hvector(2))+1;
                xmax=nx;
            else
                xmin=1;
                xmax=nx-hvector(2);
            end
            if hvector(3)<0
                zmin=abs(hvector(3))+1;
                zmax=nz;
            else
                zmin=1;
                zmax=nz-hvector(3);
            end
            sumprod=0;
            sum1=0;
            sum2=0;
            for k=zmin:zmax
                for j=ymin:ymax
                    for i=xmin:xmax
                        sumprod=X(j,i,k)*X(j+hy,i+hx,k+hz) + sumprod;
                        sum1=X(j,i,k) + sum1;
                        sum2=X(j+hy,i+hx,k+hz) + sum2;
                    end
                end
            end
            countpairs = numel(ymin:ymax)*numel(xmin:xmax)*numel(zmin:zmax);      
            sumprod=sumprod/(countpairs);
            sum1=sum1/(countpairs);
            sum2=sum2/(countpairs);
            Covariance(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)=sumprod - sum1*sum2; 
            %Standardize covariance values to correlogram
            mean1=sum1;
            mean2=sum2;
            sumvar1=0;
            sumvar2=0;
            for k=zmin:zmax
                for j=ymin:ymax
                    for i=xmin:xmax
                        sumvar1=(X(j,i,k)-mean1)^2 + sumvar1;
                        sumvar2=(X(j+hy,i+hx,k+hz)-mean2)^2 + sumvar2;
                    end
                end
            end
            sumvar1=sumvar1/countpairs;
            sumvar2=sumvar2/countpairs;
            Correlogram(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)=Covariance(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)/sqrt(sumvar1*sumvar2); 
        end
    end
end
end

I first compared these codes for a 2D input image (nz=1) and noticed that my new function runs much slower than the original function:

ExhaustiveCovarianceFunction(testslice,25): Elapsed time is 0.057659 seconds.
ExhaustiveCovarianceFunction3D(testslice,[25,25,0]): Elapsed time is 19.251866 seconds.

After some testing, this led me to realize that indexing a 3D array is much slower than a 2D array in MATLAB, and that this has something to do with how memory is accessed for higher-dimensional arrays. I found out that, if I added a third dimension to the original ExhaustiveCovarianceFunction with index 1 that never changed (eg, X(j,i,1), with no other changes, the code slowed down significantly:

2D array index X(j,i): Elapsed time is 0.059815 seconds.
3D array index X(j,i,1): Elapsed time is 18.849217 seconds.

The idea that we cannot work with higher-dimensional arrays in MATLAB if we want to have optimal performance led me to an approach where, in my new function, I convert the 2D and 3D arrays to a 1D array and map the nD array indices to 1D:

function index1D = convert3Dto1D_index(j, i, k, ny, nx)
    index1D = j + (i - 1) * ny + (k - 1) * ny * nx;
end
function [Correlogram] = ExhaustiveCovarianceFunction3D(Xinput,lags)
X = flatten3D(Xinput);
[ny,nx,nz]=size(Xinput);
%lags is now a vector
Covariance=NaN([2*lags(1)+1, 2*lags(2)+1, 2*lags(3)+1]);
Correlogram=NaN([2*lags(1)+1, 2*lags(2)+1, 2*lags(3)+1]);
for hz=-lags(3):lags(3)
    for hy=-lags(1):lags(1)
        for hx=-lags(2):lags(2)
            hvector = [hy hx hz];
            if hvector(1)<0
                ymin=abs(hvector(1))+1;
                ymax=ny;
            else
                ymin=1;
                ymax=ny-hvector(1);
            end
            if hvector(2)<0
                xmin=abs(hvector(2))+1;
                xmax=nx;
            else
                xmin=1;
                xmax=nx-hvector(2);
            end
            if hvector(3)<0
                zmin=abs(hvector(3))+1;
                zmax=nz;
            else
                zmin=1;
                zmax=nz-hvector(3);
            end
            sumprod=0;
            sum1=0;
            sum2=0;
            for k=zmin:zmax
                for j=ymin:ymax
                    for i=xmin:xmax
                        sumprod=X(convert3Dto1D_index(j, i, k, ny, nx))*X(convert3Dto1D_index(j+hy, i+hx, k+hz, ny, nx)) + sumprod;
                        sum1=X(convert3Dto1D_index(j, i, k, ny, nx)) + sum1;
                        sum2=X(convert3Dto1D_index(j+hy, i+hx, k+hz, ny, nx)) + sum2;
                    end
                end
            end
            countpairs = numel(ymin:ymax)*numel(xmin:xmax)*numel(zmin:zmax);      
            sumprod=sumprod/(countpairs);
            sum1=sum1/(countpairs);
            sum2=sum2/(countpairs);
            Covariance(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)=sumprod - sum1*sum2; 
            %Standardize covariance values to correlogram
            mean1=sum1;
            mean2=sum2;
            sumvar1=0;
            sumvar2=0;
            for k=zmin:zmax
                for j=ymin:ymax
                    for i=xmin:xmax
                        sumvar1=(X(convert3Dto1D_index(j, i, k, ny, nx))-mean1)^2 + sumvar1;
                        sumvar2=(X(convert3Dto1D_index(j+hy, i+hx, k+hz, ny, nx))-mean2)^2 + sumvar2;
                    end
                end
            end
            sumvar1=sumvar1/countpairs;
            sumvar2=sumvar2/countpairs;
            Correlogram(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)=Covariance(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)/sqrt(sumvar1*sumvar2); 
        end
    end
end
end

This approach has better performance, but is still slower than the original 2D code:

2D: Elapsed time is 0.163243 seconds.
3D, 3D index: Elapsed time is 18.336108 seconds.
3D, 1D index: Elapsed time is 0.431890 seconds.

The optimized code with 1D array indexing is still about 3x slower. Perhaps calling convert3Dto1D_index(j, i, k, ny, nx) every time the image index is called is leading to this expense.These computational concerns are important for larger images (eg, 256x256x128) and for higher-order statistics that will be developed.

Is there a smarter approach to addressing this problem? I cannot figure out how to speed up this 3D code any more. Ideally, I seek to make the 3D/general code to have comparable speeds to the 2D "hard-coded" problem when a 2D image is input. If this is not possible, separate subroutines for 2D and 3D images is likely my best bet.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

surya venu 2024-6-17

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2128871-computation-time-significantly-differs-for-2d-3d-array-indexing-when-solving-the-same-problem-best#answer_1472556

在 MATLAB Online 中打开

Hi,

The performance hit when moving from 2D to 3D indexing, and your subsequent optimization using a 1D indexing approach, is insightful. Here are a few suggestions that could help you further optimize your code or approach the problem differently:

1. Precompute 1D Indices

Instead of calling "convert3Dto1D_index" within your nested loops, you could precompute the 1D indices for all relevant (j, i, k) combinations before entering the loops. This would reduce the computational overhead of repeatedly calling a function to perform the index conversion.

% Precompute 1D indices
indices1D = zeros(ny, nx, nz);
for k = 1:nz
    for j = 1:ny
        for i = 1:nx
            indices1D(j, i, k) = convert3Dto1D_index(j, i, k, ny, nx);
        end
    end
end

Then, use these precomputed indices directly in your loops.

2. Vectorization Where Possible

MATLAB excels at matrix and vector operations. Wherever possible, try to use vectorized operations instead of loops. For example, computations involving "sumprod", "sum1", and "sum2" might be vectorizable depending on the operations involved. For more information, check out: https://www.mathworks.com/help/matlab/matlab_prog/vectorization.html

3. Logical Indexing for Boundary Conditions

Instead of adjusting "ymin, ymax, xmin, xmax, zmin, and zmax" for boundary conditions, consider using logical indexing or masks that can be applied to your data. This can sometimes simplify the code and reduce conditional checks within the loop. For more information, check out: https://blogs.mathworks.com/steve/2008/01/28/logical-indexing/

4. Compile to MEX

For computationally intensive parts of your code, consider converting them to a MEX function written in C or C++. MEX functions can significantly outperform equivalent MATLAB code due to the compiled nature of C/C++ and the ability to optimize at a lower level. For more information about MEX functions, check out: https://www.mathworks.com/help/matlab/call-mex-functions.html

Hope it helps.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Corey Hoydic 2024-6-17

在 MATLAB Online 中打开

Hello Surya,

I was able to address each of your suggestions (minus compiling with MEX). The results are interesting:

1. Precompute 1D Indices

I calculated the indices for all locations in the image before entering any of the nested for loops. Then, I modified my summations from

sumprod=X(convert3Dto1D_index(j, i, k, ny, nx))*X(convert3Dto1D_index(j+hy, i+hx, k+hz, ny, nx)) + sumprod;
sum1=X(convert3Dto1D_index(j, i, k, ny, nx)) + sum1;
sum2=X(convert3Dto1D_index(j+hy, i+hx, k+hz, ny, nx)) + sum2;

to

sumprod=X(indices1D(j, i, k))*X(indices1D(j+hy, i+hx, k+hz)) + sumprod;
sum1=X(indices1D(j, i, k)) + sum1;
sum2=X(indices1D(j+hy, i+hx, k+hz)) + sum2;

Unfortunately, the performance hit is quite significant:

Original code: 0.491754 seconds
Precomputed index: 18.954286 seconds

My best guess is that this relates to my original problem that I faced: calling a value from a 3D array is much slower than calling the function convert3Dto1D_index(j, i, k, ny, nx)) each time. I would conclude that it is best to avoid any 3D array indexing (except when vectorizing, see below).

2. Vectorization Where Possible

I vectorized the summation loops in two different ways

Vectorization, 3D (original) index:

k=zmin:zmax;
j=ymin:ymax;
i=xmin:xmax;
sumprod=sum(sum(sum(X(j,i,k).*X(j+hy,i+hx,k+hz))));
sum1=sum(sum(sum(X(j,i,k))));
sum2=sum(sum(sum(X(j+hy,i+hx,k+hz))));

Vectorization, 1D index (indices are pre-computed):

k=zmin:zmax;
j=ymin:ymax;
i=xmin:xmax;
idx_1 = indices1D(j,i,k);
idx_2 = indices1D(j+hy,i+hx,k+hz);
sumprod=sum(sum(sum(X(idx_1).*X(idx_2))));
sum1=sum(sum(sum(X(idx_1))));
sum2=sum(sum(sum(X(idx_2))));

Some benchmarking of results follows. Note that the 2D code is the same as the 3D code, but hard-codes only the first two dimensions X(j,i).

For a 2D image (400x400):

2D code, 1D index: 161.380283 seconds
2D code, 2D index: 12.208195 seconds
2D code, vectorized, 1D index: 78.073599 seconds.
2D code, vectorized, 2D index, 22.776255 seconds
3D code, 1D index: 161.752561 seconds
3D code, 3D index: too slow.. wont bother
3D code, vectorized, 1D index: 78.179389 seconds
3D code, vectorized, 3D index: 24.832623 seconds

For a 3D input (51x51x11):

1D index: 40.293362 seconds
3D index: slow
Vectorized, 1D index: 10.849903 seconds
Vectorized, 3D index: 3.418844 seconds

Interestingly:

The 3D code benefits from vectorization for performance, but the 2D code does not
The 1D indexing approach has approximately the same computational load for a 2D image sent through the 2D and 3D codes. This makes sense, since we ditch the 3D array indexing early on.
Vectorizing performs better when the original array dimensions are used (don't convert to 1D index).

3. Logical Indexing for Boundary Conditions

I did not address this, but the original version of my code implemented bounds checks and naively iterated over all i,j,k. Removing this and adding the dynamic bounds proved to be faster, since

if hvector(1)<0
    ymin=abs(hvector(1))+1;
    ymax=ny;
else
    ymin=1;
    ymax=ny-hvector(1);
end
if hvector(2)<0
    xmin=abs(hvector(2))+1;
    xmax=nx;
else
    xmin=1;
    xmax=nx-hvector(2);
end
if hvector(3)<0
    zmin=abs(hvector(3))+1;
    zmax=nz;
else
    zmin=1;
    zmax=nz-hvector(3);
end

is rather inexpensive overall.

At this point, it is looking like I am going to want separate subroutines for 2D and 3D inputs, which I wanted to avoid. The 2D code does not benefit from 1D indexing or vectorization. The 3D code benefits from both 1D indexing and vectorization, with the best performance when vectorizing and using the original 3D index. I was hoping to have a general formulation which would perform best (for both 2D and 3D inputs), but I am not yet there. Thank you for the guidance, please let me know if you have any further thoughts.

请先登录，再进行评论。

Computation time significantly differs for 2D, 3D array indexing when solving the same problem. Best approach to address performance?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Computation time significantly differs for 2D, 3D array indexing when solving the same problem. Best approach to address performance?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论