KMeans for big data using preconditioning and sparsification, Matlab implementation. This has three main features:
(1) it has good code: same accuracy and 100x faster than Matlab's K-means for some cases. It also incorporates the latest research, such as using K-Means++ for the initialization (Note: Matlab's R2015 K-Means now uses K-Means++ too). The code is well-documented and conforms to the conventions of Matlab's K-means function when possible.
(2) optionally, you can enable the precondition-and-sample feature which is a novel method to allow efficient processing when the datasets are extremely large and slow to work with.
(3) for datasets that are a few TB in size, you can use the read-from-disk option so that the entire matrix is never loaded into RAM all at once.
Installation is easy; run `setup_kmeans.m` and it will install the mex files for you if necessary, and setup the appropriate paths.
Stephen Becker (2020). Sparsified K-Means (https://github.com/stephenbeckr/SparsifiedKMeans), GitHub. Retrieved .
Fixed typos in the description, no change to code (but github version is updated regularly)