MATLAB Answers

0

Initial centroids selection - Kmeans

Asked by Salad Box on 26 Sep 2019
Latest activity Edited by Adam
on 26 Sep 2019
Hi,
Am I allowed to choose k initial centroids that are not contained in the original data set, in another word, not using the random sampling.
For instance, in the below two graphs the middle coloured points are my original data set.
  • In the left graph, the 5 red points are the initial centroids I selected using my own method.
  • In the right graph, the initial centroids will be evenly distributed on the megenta circle. Notice that, although my original data set will all be positive numbers, some initial centroids will have negative values in this case depending on the location of the initial centroids on the circle.
I wonder whether there are any fundemental mistakes I made which I haven't been aware of yet for selecting initial centroids using above two proposed methods.
Even there are no fundermental mistakes, any disadvantages of using these two ways of selecting initial centroids?

  0 Comments

Sign in to comment.

1 Answer

Answer by Adam
on 26 Sep 2019
Edited by Adam
on 26 Sep 2019
 Accepted Answer

doc kmeans
shows the
idx = kmeans(X,k,Name,Value)
function signature. If you look at the options for 'Name', 'Value' pairs you will see that 'Start' allows you to input your own starting positions.
As for what is a valid choice, simplest way is to try them and find out. In some cases they may not converge to where you want, in others they may do. Without random initialisation it is a 100% deterministic algorithm though so it would only be a single test to get the 1 answer in each case (although there are, of course, an infinite number of ways to place evenly distributed points around that circle)..

  0 Comments

Sign in to comment.