Vectorized code slower than loops?
    4 次查看(过去 30 天)
  
       显示 更早的评论
    
This question is a bit an offspring from an other one, but I have the following two codes:
maxN = 100;
levels = maxN+1;
xElements = 101;
umn = complex(zeros(levels, levels)); % cleaning 
bessels = ones(1201, 1201, 101);    % 1.09 GB
negMcontainer = ones(1201, 1201, 100);
posMcontainer = negMcontainer;
tic
for j = 1 : xElements
    for i = 1 : xElements
        for n = 1 : 2 : maxN
            nn = n + 1;
            mm = 1;
            m = 1:2:n;
            numOfEl = ceil(n/2);
            umn(nn, mm:mm+numOfEl-1) = bessels(i, j, nn) * posMcontainer(i, j, m);
        end
    end
end
toc
tic
for j = 1 : xElements
    for i = 1 : xElements
        for n = 1 : 2 : maxN
            nn = n + 1;
            mm = 1;
            for m = 1 : 2 : n
                umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m);
                mm = mm + 1;
            end
        end
    end
end
toc
And it tourns out, that loops version is faste >2x. Why is that so? I know that i happens if vectorization requiers large temporary variables, but (it seems) it is not true here.
And generally, what (other than parfor) can I do to speed up this code?
Best regards, Alex
1 个评论
  Alexandra Harkai
      
 2016-9-2
				Not sure about the speedup possibilities just yet, but regarding the vectorisation, this may be helpful in seeing where the vector/loop implementations make a difference: http://www.matlabtips.com/matlab-is-no-longer-slow-at-for-loops/
采纳的回答
  per isakson
      
      
 2016-9-2
        
      编辑:per isakson
      
      
 2016-9-3
  
      Given
- Matlab stores matrices in column-major order.
- bessels and posMcontainer are both large
Possibly the transport of data between the memory and the cpu will be more efficient (the caches will work better) if
umn(nn, mm:mm+numOfEl-1) = bessels(i, j, nn) * posMcontainer(i, j, m);
was replaced by
umn(mm:mm+numOfEl-1,nn) = bessels(nn, i, j) * posMcontainer(m, i, j);
The same should apply to the "all-for-loop-case".
 

result =runperf('NestedLoops.m');
fullTable = vertcat(result.Samples);   
varfun(@mean,fullTable,'InputVariables'         ...
      ,'MeasuredTime','GroupingVariables','Name')
ans = 
           Name           GroupCount    mean_MeasuredTime
    __________________    __________    _________________
    NestedLoops/test      4             1.3266          
    NestedLoops/test_1    4             0.88148          
    NestedLoops/test_2    4             0.49775
where NestedLoops.m contains
X=rand(100,100,2000);
for ii=1:100
    for jj=1:100
        X(ii,jj,:)=10*X(ii,jj,:);
    end
end
X=rand(100,100,2000);
for jj=1:100
    for ii=1:100
        X(ii,jj,:)=10*X(ii,jj,:);
    end
end
X=rand(2000,100,100);
for jj=1:100
    for ii=1:100
        X(:,ii,jj)=10*X(:,ii,jj);
    end
end
The "differences" between the "cases" are actually larger, since
>> tic, X=rand(100,100,2000);, toc
Elapsed time is 0.355542 seconds.
6 个评论
  per isakson
      
      
 2016-9-3
				
      编辑:per isakson
      
      
 2016-9-5
  
			Thanks, but TLNR.
Neither do I, however I get the impression that Coder switches the order of the loops to account for the difference in major order.
"slowed down a bit in .mex"   Now, I believe that one should code for column-major in Matlab and that Coder adapts the C-code to row-major. However, it puzzles me that the difference in C is only "a bit", since in Matlab it's significant.
更多回答(0 个)
另请参阅
类别
				在 Help Center 和 File Exchange 中查找有关 Loops and Conditional Statements 的更多信息
			
	产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


