What does sumd method in k-means clustering function exactly calculate?
2 次查看(过去 30 天)
显示 更早的评论
I am doing basic experiments with kmeans function. As a real simple example, say that I have a data set of 4 items with 1 attribute and this attribute is their value:
Data=[1;2;3;4];
If I want to split this data set into 2 clusters I should get one centroid in 1.5 and another in 3.5:
[idx,C,sumd]=kmeans(Data,2)
C =
1.5000
3.5000
and I get it. However to my understanding sumd in this case should be:
abs(1-1.5)+abs(2-1.5) or abs(3-3.5)+abs(4-3.5)
ans =
1
but I am getting sumd as:
sumd =
0.5000
0.5000
for both clusters. Instead of getting 1's for both.
My question is what exactly does sumd calculate?
0 个评论
采纳的回答
Ameer Hamza
2018-5-8
编辑:Ameer Hamza
2018-5-8
If you look at the documentation of kmeans(), you will know that it uses the square of the Euclidean distance, by default. So you should calculate it like this
abs(1-1.5).^2+abs(2-1.5).^2 or abs(3-3.5).^2+abs(4-3.5).^2
ans =
0.5 (both cases)
更多回答(1 个)
the cyclist
2018-5-8
It's because the default distance metric used is the squared Euclidean distance (for minimization, and reporting). See the Distance input parameter.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Statistics and Machine Learning Toolbox 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!