Computation time significantly differs for 2D, 3D array indexing when solving the same problem. Best approach to address performance?
12 次查看(过去 30 天)
显示 更早的评论
I wrote a simple code that calculates the covariance of images along different directions and lag spacings and returns a 2D map of covariance values varying by lags(h1,h2). The original program I wrote accepts only 2D inputs:
function [Correlogram] = ExhaustiveCovarianceFunction(X,lags)
[ny,nx,~]=size(X);
Covariance=NaN([2*lags+1 2*lags+1]);
Correlogram=NaN([2*lags+1 2*lags+1]);
for hy=-lags:lags
for hx=-lags:lags
sumprod=0;
sum1=0;
sum2=0;
countpairs=0;
for j=1:ny
for i=1:nx
if hy+j<=0 || hy+j>ny || hx+i<=0 || hx+i>nx
continue
else
sumprod=X(j,i)*X(j+hy,i+hx) + sumprod;
sum1=X(j,i) + sum1;
sum2=X(j+hy,i+hx) + sum2;
countpairs=countpairs+1;
end
end
end
sumprod=sumprod/(countpairs);
sum1=sum1/(countpairs);
sum2=sum2/(countpairs);
Covariance(hy+lags+1,hx+lags+1)=sumprod - sum1*sum2;
%Standardize covariance values to correlogram
mean1=sum1;
mean2=sum2;
sumvar1=0;
sumvar2=0;
for j=1:ny
for i=1:nx
if hy+j<=0 || hy+j>ny || hx+i<=0 || hx+i>nx
continue
else
sumvar1=(X(j,i)-mean1)^2 + sumvar1;
sumvar2=(X(j+hy,i+hx)-mean2)^2 + sumvar2;
end
end
end
sumvar1=sumvar1/countpairs;
sumvar2=sumvar2/countpairs;
Correlogram(hy+lags+1,hx+lags+1)=Covariance(hy+lags+1,hx+lags+1)/sqrt(sumvar1*sumvar2);
end
end
end
I wanted to extend this code to allow for up to 3D image inputs, but also allow 2D or 1D inputs. Some other changes for optimization were made (dynamically updating bounds, removing if/else statement):
function [Correlogram] = ExhaustiveCovarianceFunction3D(X,lags)
[ny,nx,nz]=size(X);
%lags is now a vector
Covariance=NaN([2*lags(1)+1, 2*lags(2)+1, 2*lags(3)+1]);
Correlogram=NaN([2*lags(1)+1, 2*lags(2)+1, 2*lags(3)+1]);
for hz=-lags(3):lags(3)
for hy=-lags(1):lags(1)
for hx=-lags(2):lags(2)
hvector = [hy hx hz];
if hvector(1)<0
ymin=abs(hvector(1))+1;
ymax=ny;
else
ymin=1;
ymax=ny-hvector(1);
end
if hvector(2)<0
xmin=abs(hvector(2))+1;
xmax=nx;
else
xmin=1;
xmax=nx-hvector(2);
end
if hvector(3)<0
zmin=abs(hvector(3))+1;
zmax=nz;
else
zmin=1;
zmax=nz-hvector(3);
end
sumprod=0;
sum1=0;
sum2=0;
for k=zmin:zmax
for j=ymin:ymax
for i=xmin:xmax
sumprod=X(j,i,k)*X(j+hy,i+hx,k+hz) + sumprod;
sum1=X(j,i,k) + sum1;
sum2=X(j+hy,i+hx,k+hz) + sum2;
end
end
end
countpairs = numel(ymin:ymax)*numel(xmin:xmax)*numel(zmin:zmax);
sumprod=sumprod/(countpairs);
sum1=sum1/(countpairs);
sum2=sum2/(countpairs);
Covariance(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)=sumprod - sum1*sum2;
%Standardize covariance values to correlogram
mean1=sum1;
mean2=sum2;
sumvar1=0;
sumvar2=0;
for k=zmin:zmax
for j=ymin:ymax
for i=xmin:xmax
sumvar1=(X(j,i,k)-mean1)^2 + sumvar1;
sumvar2=(X(j+hy,i+hx,k+hz)-mean2)^2 + sumvar2;
end
end
end
sumvar1=sumvar1/countpairs;
sumvar2=sumvar2/countpairs;
Correlogram(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)=Covariance(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)/sqrt(sumvar1*sumvar2);
end
end
end
end
I first compared these codes for a 2D input image (nz=1) and noticed that my new function runs much slower than the original function:
- ExhaustiveCovarianceFunction(testslice,25): Elapsed time is 0.057659 seconds.
- ExhaustiveCovarianceFunction3D(testslice,[25,25,0]): Elapsed time is 19.251866 seconds.
After some testing, this led me to realize that indexing a 3D array is much slower than a 2D array in MATLAB, and that this has something to do with how memory is accessed for higher-dimensional arrays. I found out that, if I added a third dimension to the original ExhaustiveCovarianceFunction with index 1 that never changed (eg, X(j,i,1), with no other changes, the code slowed down significantly:
- 2D array index X(j,i): Elapsed time is 0.059815 seconds.
- 3D array index X(j,i,1): Elapsed time is 18.849217 seconds.
The idea that we cannot work with higher-dimensional arrays in MATLAB if we want to have optimal performance led me to an approach where, in my new function, I convert the 2D and 3D arrays to a 1D array and map the nD array indices to 1D:
function index1D = convert3Dto1D_index(j, i, k, ny, nx)
index1D = j + (i - 1) * ny + (k - 1) * ny * nx;
end
function [Correlogram] = ExhaustiveCovarianceFunction3D(Xinput,lags)
X = flatten3D(Xinput);
[ny,nx,nz]=size(Xinput);
%lags is now a vector
Covariance=NaN([2*lags(1)+1, 2*lags(2)+1, 2*lags(3)+1]);
Correlogram=NaN([2*lags(1)+1, 2*lags(2)+1, 2*lags(3)+1]);
for hz=-lags(3):lags(3)
for hy=-lags(1):lags(1)
for hx=-lags(2):lags(2)
hvector = [hy hx hz];
if hvector(1)<0
ymin=abs(hvector(1))+1;
ymax=ny;
else
ymin=1;
ymax=ny-hvector(1);
end
if hvector(2)<0
xmin=abs(hvector(2))+1;
xmax=nx;
else
xmin=1;
xmax=nx-hvector(2);
end
if hvector(3)<0
zmin=abs(hvector(3))+1;
zmax=nz;
else
zmin=1;
zmax=nz-hvector(3);
end
sumprod=0;
sum1=0;
sum2=0;
for k=zmin:zmax
for j=ymin:ymax
for i=xmin:xmax
sumprod=X(convert3Dto1D_index(j, i, k, ny, nx))*X(convert3Dto1D_index(j+hy, i+hx, k+hz, ny, nx)) + sumprod;
sum1=X(convert3Dto1D_index(j, i, k, ny, nx)) + sum1;
sum2=X(convert3Dto1D_index(j+hy, i+hx, k+hz, ny, nx)) + sum2;
end
end
end
countpairs = numel(ymin:ymax)*numel(xmin:xmax)*numel(zmin:zmax);
sumprod=sumprod/(countpairs);
sum1=sum1/(countpairs);
sum2=sum2/(countpairs);
Covariance(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)=sumprod - sum1*sum2;
%Standardize covariance values to correlogram
mean1=sum1;
mean2=sum2;
sumvar1=0;
sumvar2=0;
for k=zmin:zmax
for j=ymin:ymax
for i=xmin:xmax
sumvar1=(X(convert3Dto1D_index(j, i, k, ny, nx))-mean1)^2 + sumvar1;
sumvar2=(X(convert3Dto1D_index(j+hy, i+hx, k+hz, ny, nx))-mean2)^2 + sumvar2;
end
end
end
sumvar1=sumvar1/countpairs;
sumvar2=sumvar2/countpairs;
Correlogram(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)=Covariance(hy+lags(1)+1,hx+lags(2)+1, hz+lags(3)+1)/sqrt(sumvar1*sumvar2);
end
end
end
end
This approach has better performance, but is still slower than the original 2D code:
- 2D: Elapsed time is 0.163243 seconds.
- 3D, 3D index: Elapsed time is 18.336108 seconds.
- 3D, 1D index: Elapsed time is 0.431890 seconds.
The optimized code with 1D array indexing is still about 3x slower. Perhaps calling convert3Dto1D_index(j, i, k, ny, nx) every time the image index is called is leading to this expense.These computational concerns are important for larger images (eg, 256x256x128) and for higher-order statistics that will be developed.
Is there a smarter approach to addressing this problem? I cannot figure out how to speed up this 3D code any more. Ideally, I seek to make the 3D/general code to have comparable speeds to the 2D "hard-coded" problem when a 2D image is input. If this is not possible, separate subroutines for 2D and 3D images is likely my best bet.
0 个评论
回答(1 个)
surya venu
2024-6-17
Hi,
The performance hit when moving from 2D to 3D indexing, and your subsequent optimization using a 1D indexing approach, is insightful. Here are a few suggestions that could help you further optimize your code or approach the problem differently:
1. Precompute 1D Indices
Instead of calling "convert3Dto1D_index" within your nested loops, you could precompute the 1D indices for all relevant (j, i, k) combinations before entering the loops. This would reduce the computational overhead of repeatedly calling a function to perform the index conversion.
% Precompute 1D indices
indices1D = zeros(ny, nx, nz);
for k = 1:nz
for j = 1:ny
for i = 1:nx
indices1D(j, i, k) = convert3Dto1D_index(j, i, k, ny, nx);
end
end
end
Then, use these precomputed indices directly in your loops.
2. Vectorization Where Possible
MATLAB excels at matrix and vector operations. Wherever possible, try to use vectorized operations instead of loops. For example, computations involving "sumprod", "sum1", and "sum2" might be vectorizable depending on the operations involved. For more information, check out: https://www.mathworks.com/help/matlab/matlab_prog/vectorization.html
3. Logical Indexing for Boundary Conditions
Instead of adjusting "ymin, ymax, xmin, xmax, zmin, and zmax" for boundary conditions, consider using logical indexing or masks that can be applied to your data. This can sometimes simplify the code and reduce conditional checks within the loop. For more information, check out: https://blogs.mathworks.com/steve/2008/01/28/logical-indexing/
4. Compile to MEX
For computationally intensive parts of your code, consider converting them to a MEX function written in C or C++. MEX functions can significantly outperform equivalent MATLAB code due to the compiled nature of C/C++ and the ability to optimize at a lower level. For more information about MEX functions, check out: https://www.mathworks.com/help/matlab/call-mex-functions.html
Hope it helps.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Matrix Indexing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!