K-mean for Wine data set

Question

1 个投票

Hi,

I performed a K-mean algorithm command on the wine data set from UCI respiratory. This dataset contains chemical analysis of 178 wines, derived from three different cultivars. Wine type is based on 13 continuous features.

Here's the command load 'wine_data.txt';

[IDX,C,sumd,D] = kmeans(wine_data,3,... 'start','sample',... 'Replicates',100,... 'maxiter',1000, 'display','final');

The final Best total sum of distances is 2.37069e+06. This result is way far from the reported K-means solution from the literature, which is aournd 18,061. Is the K-mean solution of Matlab stuck in local minima? Please advice. Thanks.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

the cyclist 2013-8-27

For anyone who is interested in helping out on this one, the data set is here: http://archive.ics.uci.edu/ml/datasets/Wine

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Shashank Prasanna 2013-8-27

0 个投票

Ganesh, what distance metric does the 'literature' use?

The kmeans default is 'sqEuclidean'. You have to make sure you are comparing the same metric. Try changing it to cityblock or any of the other options:

http://www.mathworks.com/help/stats/kmeans.html

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 2

Ganesh 2013-8-27

0 个投票

Thanks for the reply Shashank The literature used 'sqEuclidean' and so did I.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

tryhard 2013-8-29

Could you post a link to the relevant article. I get the same result you do. It seems like they might have performed pre-processing on the data of some sort.

请先登录，再进行评论。

Answer 3

gheorghe gardu 2015-11-1

0 个投票

I would like to ask if you could post the Matlab code that you have used ? I would like to thank you in advance.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 4

Paul Munro 2023-2-21

0 个投票

The large distance sum you report makes me think that you did not rescale the data. Variable 13 is in the thousands and will overwhelm the effect of the other variables. You will probably get better results if you rescale the variables separately (Z scoring for example).

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

K-mean for Wine data set

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

回答（4 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

类别

产品

标签

Community Treasure Hunt

K-mean for Wine data set

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

回答（4 个）

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

类别

产品

标签

另请参阅

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论