Why is gpuArray\knnsearch so slow?

I want to accelerate my code using the GPU. In my case, using knnsearch with a gpuArray is very slow.
Here is a code-snippet to test:
%%sample data
ptCloud = pcread('teapot.ply');
%%Option 1: CPU
points = ptCloud.Location;
k = 50;
[idxCPU, distCPU] = knnsearch(points, points, 'K', k);
tCPU = toc;
%%Option 2: GPU
pointsOnGPU = gpuArray(ptCloud.Location);
kOnGPU = gpuArray(50);
[idxGPU, distGPU] = knnsearch(pointsOnGPU, pointsOnGPU,'K',kOnGPU);
tGPU = toc;
%%Option 3: ptCloud.findNearestNeighbors in a for loop
ind = zeros(k, ptCloud.Count);
dists = zeros(k, ptCloud.Count);
for i=1:ptCloud.Count
[ind(:,i), dists(:,i)] = findNearestNeighbors(ptCloud, ptCloud.Location(i,:), k);
tOption3 = toc;
My result:
I use Matlab R2017b on Ubuntu 16.04. CPU: Intel® Core™ i7-6700HQ CPU @ 2.60GHz
CUDADevice with properties:
Name: 'Quadro M1000M'
Index: 1
ComputeCapability: '5.0'
SupportsDouble: 1
DriverVersion: 9
ToolkitVersion: 8
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.0978e+09
AvailableMemory: 1.5902e+09
MultiprocessorCount: 4
ClockRateKHz: 1071500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
I expected that Option 2 is the fastest. Can anyone explain me why this is not the case? Actually, my point cloud contains much more points and the difference is considerably greater.

Hao Zhang
Hao Zhang 2018-12-13
Hi, the gpu version of knnsearch seems to use brute force method. you can check this by inputing a very large points cloud so that run out of memory,matlab will report error on the pdist2 function which trying to comput all pairwise distances.


