Vectorized code slower than loops?

3 次查看(过去 30 天)
This question is a bit an offspring from an other one, but I have the following two codes:
maxN = 100;
levels = maxN+1;
xElements = 101;
umn = complex(zeros(levels, levels)); % cleaning
bessels = ones(1201, 1201, 101); % 1.09 GB
negMcontainer = ones(1201, 1201, 100);
posMcontainer = negMcontainer;
tic
for j = 1 : xElements
for i = 1 : xElements
for n = 1 : 2 : maxN
nn = n + 1;
mm = 1;
m = 1:2:n;
numOfEl = ceil(n/2);
umn(nn, mm:mm+numOfEl-1) = bessels(i, j, nn) * posMcontainer(i, j, m);
end
end
end
toc
tic
for j = 1 : xElements
for i = 1 : xElements
for n = 1 : 2 : maxN
nn = n + 1;
mm = 1;
for m = 1 : 2 : n
umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m);
mm = mm + 1;
end
end
end
end
toc
And it tourns out, that loops version is faste >2x. Why is that so? I know that i happens if vectorization requiers large temporary variables, but (it seems) it is not true here.
And generally, what (other than parfor) can I do to speed up this code?
Best regards, Alex
  1 个评论
Alexandra Harkai
Alexandra Harkai 2016-9-2
Not sure about the speedup possibilities just yet, but regarding the vectorisation, this may be helpful in seeing where the vector/loop implementations make a difference: http://www.matlabtips.com/matlab-is-no-longer-slow-at-for-loops/

请先登录,再进行评论。

采纳的回答

per isakson
per isakson 2016-9-2
编辑:per isakson 2016-9-3
Given
  • Matlab stores matrices in column-major order.
  • bessels and posMcontainer are both large
Possibly the transport of data between the memory and the cpu will be more efficient (the caches will work better) if
umn(nn, mm:mm+numOfEl-1) = bessels(i, j, nn) * posMcontainer(i, j, m);
was replaced by
umn(mm:mm+numOfEl-1,nn) = bessels(nn, i, j) * posMcontainer(m, i, j);
The same should apply to the "all-for-loop-case".
&nbsp
And finally the test from Columns and Rows are not the same with an additional case. (R2016a,
result =runperf('NestedLoops.m');
fullTable = vertcat(result.Samples);
varfun(@mean,fullTable,'InputVariables' ...
,'MeasuredTime','GroupingVariables','Name')
ans =
Name GroupCount mean_MeasuredTime
__________________ __________ _________________
NestedLoops/test 4 1.3266
NestedLoops/test_1 4 0.88148
NestedLoops/test_2 4 0.49775
where NestedLoops.m contains
X=rand(100,100,2000);
for ii=1:100
for jj=1:100
X(ii,jj,:)=10*X(ii,jj,:);
end
end
X=rand(100,100,2000);
for jj=1:100
for ii=1:100
X(ii,jj,:)=10*X(ii,jj,:);
end
end
X=rand(2000,100,100);
for jj=1:100
for ii=1:100
X(:,ii,jj)=10*X(:,ii,jj);
end
end
The "differences" between the "cases" are actually larger, since
>> tic, X=rand(100,100,2000);, toc
Elapsed time is 0.355542 seconds.
  6 个评论
Alex Kurek
Alex Kurek 2016-9-3
I do not know C language. But if you want, it is here: https://www.dropbox.com/s/69bf8fj7lc6cnbc/fisherComputer.c?dl=0
per isakson
per isakson 2016-9-3
编辑:per isakson 2016-9-5
Thanks, but TLNR.
Neither do I, however I get the impression that Coder switches the order of the loops to account for the difference in major order.
"slowed down a bit in .mex" &nbsp Now, I believe that one should code for column-major in Matlab and that Coder adapts the C-code to row-major. However, it puzzles me that the difference in C is only "a bit", since in Matlab it's significant.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Logical 的更多信息

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by