# Permutations Inversions Counting ... speed up

7 views (last 30 days)
Michal Kvasnicka on 23 Oct 2017
Edited: John D'Errico on 23 Oct 2017
Hi, this is my code for permutation inversions counting. Any idea how to optimize it to get some additional speedup for large set of permutations? See my previous problem: here
function count = PermInvsCount(p)
[nR,nE] = size(p);
count = zeros(nR,1);
for i = 1:nE-1
for j = i+1:nE
count = count + (p(:, i) > p(:, j));
end
end
end
Some testing data:
[~,perms] = sort(rand(1000000,60),2);
tic;c= PermInvsCount(perms);toc
Elapsed time is 1.500214 seconds.

John D'Errico on 23 Oct 2017
Edited: John D'Errico on 23 Oct 2017
Why do people always think there is something they could have done that was faster? You are doing something on the order of 2 billion comparisons, and as many additions. 1.5 seconds does not seem wildly out of line for that.
If you need a speedup, you might look into the parallel toolbox. The fact is however, your code is already multithreading. A quick test on my machine showed all of my cores running when I execute that code. MATLAB automatically sets many processes up for multithreading IF it sees a gain by doing so. In this case, it did. That means you won't gain by use of forcing MATLAB to use the spare cores on your computer, because it will do so already.
You can check the value of
to make sure that MATLAB is able to use all of your cores. (Hyper-threaded cores don't give you any real boost, so don't go there.)
If you can push those computations onto your GPU, there might be some gain. I can't say about that, since I don't have the parallel toolbox.
So, you need to do a few billion computations there. You are already doing them. Doing them in some different order won't help a lot.
Ok, I can already hear you wondering if or why this should have been better:
function count = PermInvsCount2(p)
[nR,nE] = size(p);
count = zeros(nR,1);
for i = 1:nE-1
count = count + sum((p(:, i) > p(:, (i+1):end))),2);
end
end
As it turns out, that code, even though it is in theory vectorized more, also does more work. It needs to extract sub-arrays of different sizes in a loop.
The loop overhead here is simply not important. These loops below are trivial, compared to what happens inside the loop.
for i = 1:nE-1
for j = i+1:nE
end
end
So unless you can use the Parallel Computing toolbox, and you have a decent GPU, I don't see you getting much of a gain.