Why is the vectorized version of simple local maxima detection code significantly slower (~2-3 times) than its for-loop version?
X = rand(100000,1000);
[I,J] = size(X);
Ind = false(I,J);
for j = 1:J
Ind(:,j) = diff( sign( diff([0; X(:,j); 0]) ) ) < 0;
Ind_ = diff(sign(diff([zeros(1,J);X;zeros(1,J)],1,1)),1,1) < 0;