I don't know the algorithmic complexity of the kernel density function, but I know that one needs to do function evaluations at not just the input points, but "nearby" points as well. I can imagine that things blow up fast.
I can imagine it might be faster to use ksdensity at only, say, 1/10 of the points, and then interpolate (using, e.g. interp1) at the points in between. I don't know how much accuracy you would lose, though.
