how to use testing data to validate kmeans?

1 次查看(过去 30 天)
Hello there,
I have some data in 8 text files. I would like to classify the similar ones into same classes. I am using k-means for now. I would like to have 5 of the files as training and 3 of them for testing. I have used kmeans command to have k classes, however, I do not know how to validate my results. In other words, I do not know how to use my testing data to calculate the error? I would appreciate if somebody help me. Thanks in advance.

采纳的回答

Image Analyst
Image Analyst 2014-3-23
If you do not know the "ground truth" of your data then there's no way to tell if it's "wrong". The only thing you can do (I think) is to classify your "unknown" data and measure how far off your data are from the means of the classes. For example, let's say you had a cluster of data "class#1" around 30 +/- 5, and you had a second cluster "class#2" at 100+/-20. So you run kmeans with 2 classes and it tells you about those two classes, with the mean at 30 and 100. Now you have a data point in the "non-training" set of data and it has a value of 70. So you can say that the 65 belongs to class#2 and it's 40 from class#1 and 30 from class#2. You can do the same for all other data in your test sets.
  3 个评论
Image Analyst
Image Analyst 2014-3-23
To accurately get the error you have to know the tru e values, don't you? And you don't know those. So all you have is a guess.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Statistics and Machine Learning Toolbox 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by