Vectorized code slower than loops?
3 次查看(过去 30 天)
显示 更早的评论
This question is a bit an offspring from an other one, but I have the following two codes:
maxN = 100;
levels = maxN+1;
xElements = 101;
umn = complex(zeros(levels, levels)); % cleaning
bessels = ones(1201, 1201, 101); % 1.09 GB
negMcontainer = ones(1201, 1201, 100);
posMcontainer = negMcontainer;
tic
for j = 1 : xElements
for i = 1 : xElements
for n = 1 : 2 : maxN
nn = n + 1;
mm = 1;
m = 1:2:n;
numOfEl = ceil(n/2);
umn(nn, mm:mm+numOfEl-1) = bessels(i, j, nn) * posMcontainer(i, j, m);
end
end
end
toc
tic
for j = 1 : xElements
for i = 1 : xElements
for n = 1 : 2 : maxN
nn = n + 1;
mm = 1;
for m = 1 : 2 : n
umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m);
mm = mm + 1;
end
end
end
end
toc
And it tourns out, that loops version is faste >2x. Why is that so? I know that i happens if vectorization requiers large temporary variables, but (it seems) it is not true here.
And generally, what (other than parfor) can I do to speed up this code?
Best regards, Alex
1 个评论
Alexandra Harkai
2016-9-2
Not sure about the speedup possibilities just yet, but regarding the vectorisation, this may be helpful in seeing where the vector/loop implementations make a difference: http://www.matlabtips.com/matlab-is-no-longer-slow-at-for-loops/
采纳的回答
per isakson
2016-9-2
编辑:per isakson
2016-9-3
Given
- Matlab stores matrices in column-major order.
- bessels and posMcontainer are both large
Possibly the transport of data between the memory and the cpu will be more efficient (the caches will work better) if
umn(nn, mm:mm+numOfEl-1) = bessels(i, j, nn) * posMcontainer(i, j, m);
was replaced by
umn(mm:mm+numOfEl-1,nn) = bessels(nn, i, j) * posMcontainer(m, i, j);
The same should apply to the "all-for-loop-case".
 
result =runperf('NestedLoops.m');
fullTable = vertcat(result.Samples);
varfun(@mean,fullTable,'InputVariables' ...
,'MeasuredTime','GroupingVariables','Name')
ans =
Name GroupCount mean_MeasuredTime
__________________ __________ _________________
NestedLoops/test 4 1.3266
NestedLoops/test_1 4 0.88148
NestedLoops/test_2 4 0.49775
where NestedLoops.m contains
X=rand(100,100,2000);
for ii=1:100
for jj=1:100
X(ii,jj,:)=10*X(ii,jj,:);
end
end
X=rand(100,100,2000);
for jj=1:100
for ii=1:100
X(ii,jj,:)=10*X(ii,jj,:);
end
end
X=rand(2000,100,100);
for jj=1:100
for ii=1:100
X(:,ii,jj)=10*X(:,ii,jj);
end
end
The "differences" between the "cases" are actually larger, since
>> tic, X=rand(100,100,2000);, toc
Elapsed time is 0.355542 seconds.
6 个评论
per isakson
2016-9-3
编辑:per isakson
2016-9-5
Thanks, but TLNR.
Neither do I, however I get the impression that Coder switches the order of the loops to account for the difference in major order.
"slowed down a bit in .mex"   Now, I believe that one should code for column-major in Matlab and that Coder adapts the C-code to row-major. However, it puzzles me that the difference in C is only "a bit", since in Matlab it's significant.
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Logical 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!