K-Means: Indices of C in idx

Question

Maurizio Cimino 2019-7-17

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/472154-k-means-indices-of-c-in-idx

编辑： Maurizio Cimino 2019-7-17

Hello, I'm using the built-in kmeans function but I don't understand a thing. After I've applied the function, I have two output parameters: idx and C. The first one is a matrix containing, for each observation, the cluster's index where it has been classified. The second one is the matrix containing all the centroids location. Well, is there a way to know which are the indices of the centroids C inside the matrix idx? For example, I would like to know the index inside idx of C(:,1), etc.

Thank you very much.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

the cyclist 2019-7-17

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/472154-k-means-indices-of-c-in-idx#answer_383631

I'm not certain I fully understand your question, but I'll make a guess.

The centroid is not (necessarily) one of the points of the original dataset, so none of the rows of idx correspond to the centroid itself.

The centroid locations are given by the rows of C (not the columns). The row C(1,:) is the centroid for the cluster of points with idx=1. The row C(2,:) is the centroid for the cluster of points with idx=2. And so on.

If that doesn't answer your question, maybe you could comment with some clarification.

3 个评论
显示 1更早的评论隐藏 1更早的评论

the cyclist 2019-7-17

Here is a trivial example where the centroids are not in the dataset:

x = [-2 -1 1 2]';
y = [0 0 0 0]';
[idx,C] = kmeans([x y],2);
figure
hold on
% Plot data points
scatter(x,y,[],idx)
% Plot centroids
h(1) = plot(C(1,1),C(1,2),'.');
h(2) = plot(C(2,1),C(2,2),'.');
set(h,'MarkerSize',24)

I am certain that MATLAB is not going to output which data point (from the original data) is the centroid, because it is not guaranteed to be in the dataset.

You could call with the syntax

[idx,C,sumd,D] = kmeans()

and find the point with the minimum within-cluster distance.

Maurizio Cimino 2019-7-17

编辑：Maurizio Cimino 2019-7-17

Thank you for your reply.

Okay, very good idea. I have done like this: I just use D to compute, for each column, which is the minimum element (the nearest data-point from the current centroid).

Thank you very much.

请先登录，再进行评论。