I am very confused by the performance difference. fun_A is much faster than fun_B. Anybody has any ideas as to why? I am running R2013a. (I looked into a similar question that concerned matrix multiplication, but this should be much simpler: it only uses ranged indexing!)
function [out] = fun_A(permutation, in)
NoU = length(permutation);
NoF = length(in) / NoU;
out = zeros(1, length(in));
for i = 1:NoU
lhs_start = (i-1)*NoF;
rhs_start = (permutation(i)-1)*NoF
for j = 1:NoF
out(lhs_start+j) = in(rhs_start+j);
end
end
end
function[out] = fun_B(permutation, in)
NoU = length(permutation);
NoF = length(in) / NoU;
out = zeros(1, length(in));
for i = 1:NoU
lhs_start = (i-1)*NoF + 1;
lhs_end = i*NoF;
rhs_start = (permutation(i)-1)*NoF + 1;
rhs_end = permutation(i)*NoF;
out(lhs_start:lhs_end) = in(rhs_start:rhs_end);
end
end
If you need an understanding of what happens here, assume that permutation = [3 1 2] and that in = [a1 a2 b1 b2 c1 c2], i.e. that we have 3 users (call them a,b,c) with two features each (hence a1,a2 and so on). I want to create and return the following: out = [c1 c2 a1 a2 b1 b2], i.e. to rearrange my users according to the permutation but of course have their corresponding features in succession.
- Typical values are : NoU = 6, NoF = 5.
- fun_A (or fun_B) is supposed to be called more than a million times
- I only care about the performance of the lines that have comments next to them. Please don't point out inefficiencies that are relevant to the calculations of NoU, NoF, the temp variables such as lhs_start etc. These appear here just to make this a standalone example