Indexing a range of a 3d matrix incredibly slow?
13 次查看(过去 30 天)
显示 更早的评论
I have a function that does several different operations between a source and destination matrix (each 3d), generally inserting a small source into a target area of a large destination. I need to run this function thousands-tens of thousands of times, so speed is valuable. However, when profiling I have found that virtuall all of the runtime is on the line that performs the final operation, where there's a lot of range indexing in 3d. example:
dest(dx,dy,dz) = source(sx,sy,sz) + dest(dx,dy,dz);
where all of di and si are ranges. That line alone takes ~99% of the runtime per call (taking ~6 seconds for 100 test runs). Is this sort of indexing operation inherently super slow? I find the slow speed odd, especially when binarizing each of those same 3d regions is ~30 times faster.
Is there a huge inefficiency i'm not seeing, or some trick to make this process faster?
EDIT: the rest of the code, where the ranges are generated
[d1, d2, d3]=size(dest);
[s1, s2, s3]=size(source);
%dest
dx=max(1,coord(1)):min(d1,coord(1)+s1-1);
dy=max(1,coord(2)):min(d2,coord(2)+s2-1);
dz=max(1,coord(3)):min(d3,coord(3)+s3-1);
%source
sx=max(-coord(1)+2,1):min(d1-coord(1)+1,s1);
sy=max(-coord(2)+2,1):min(d2-coord(2)+1,s2);
sz=max(-coord(3)+2,1):min(d3-coord(3)+1,s3);
note that the ranges are generated in order to find the overlap between the arrays, given some location coord relative to the destination array, that is still fully inside both arrays to avoid out-of-bounds problems.
3 个评论
回答(1 个)
Jan
2022-6-13
If you post the relevant part of the code and some matching input data, it is very likely, that this forum can provide some improvements. With the currently givendetails, the best answer I know is:
If the memory access is the bottleneck of your code, it is the bottleneck of you code.
dest(dx,dy,dz) = source(sx,sy,sz) + dest(dx,dy,dz)
This line creates a copy of source(sx,sy,sz) ate first, than dest(dx,dy,dz), adds them and inserts them in dest. Indexing with vectors requires a range-check for each element.
Are the ranges sx, sy, ... contiguous? Then this would be much faster: source(sx(1):sx(end), sy(1):sy(end), sz(1):sz(end)) etc.
3 个评论
Jan
2022-6-13
编辑:Jan
2022-6-13
Try this:
[d1, d2, d3]=size(dest);
[s1, s2, s3]=size(source);
dxi = max(1, coord(1))
dxf = min(d1, coord(1)+s1-1);
dyi = max(1, coord(2));
dyf = min(d2, coord(2)+s2-1);
dzi = max(1, coord(3));
dzf = min(d3, coord(3)+s3-1);
%source
... the same here...
dest(dxi:dxf, dyi:dyf, dzi:dzf) = ... same for source and dest
This saves the times for producing the index vectors. I do not expect that this causes a dramatic speedup.
It would be useful to have a minimal working example, which includes the bottleneck of your code. It is less smart, if I invent such a piece of code, because my guesses of the details might be misleading. But let me start:
dest = rand(1000, 1000, 281);
source = rand(4000, 2000, 312);
cord = [17, 81, 90];
Something like that?!?
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Graphics Performance 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!