# K-Means Matlab cluster assignment

13 views (last 30 days)
Anastasis Pk on 18 Feb 2020
Answered: Image Analyst on 19 Feb 2020
Hello,
so I run K-Means algorithm in a data set and it can calculate that there are 4 different clusters, but the numbers are wrong. To be more specific, I would like it to assign the values in an increasing order.
E = evalclusters(c,'kmeans','DaviesBouldin','klist',[3:10])
kidx = kmeans(c,E.OptimalK);

Image Analyst on 19 Feb 2020
Anastasis, below is a full demo of how to sort the labels according to how far the cluster centroid is from the origin, and how to relabel the class numbers so that class 1 will be closest and class 4 will be farthest away from the origin.
Don't be afraid of the length of the code. It's actually simple but it just looks long because I had to put in code to make some sample clustered data (which you won't need), and has code at the end to double-check/verify the results (which you won't need), as well as tons of comments to help explain it to you (which you should probably leave in).
% Demo to show how you can redefine the class numbers assigned by kmeans() to different numbers.
% In this demo, the original, arbitrary class numbers will be reassigned a new number
% according to how far the cluster centroid is from the origin.
% Author Image Analyst, Feb. 2020.
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clearvars;
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 18;
%-------------------------------------------------------------------------------------------------------------------------------------------
% CREATE SAMPLE DATA.
% Make up 4 clusters with 150 points each.
pointsPerCluster = 150;
offsets = [0.3, 0.5, 0.7, 0.9];
% offsets = [0.62, 0.73, 0.84, 0.95];
xa = spread * randn(pointsPerCluster, 1) + offsets(1);
ya = spread * randn(pointsPerCluster, 1) + offsets(1);
xb = spread * randn(pointsPerCluster, 1) + offsets(2);
yb = spread * randn(pointsPerCluster, 1) + offsets(2);
xc = spread * randn(pointsPerCluster, 1) + offsets(3);
yc = spread * randn(pointsPerCluster, 1) + offsets(3);
xd = spread * randn(pointsPerCluster, 1) + offsets(4);
yd = spread * randn(pointsPerCluster, 1) + offsets(4);
x = [xa; xb; xc; xd];
y = [ya; yb; yc; yd];
xy = [x, y];
%-------------------------------------------------------------------------------------------------------------------------------------------
% K-MEANS CLUSTERING.
% Now do kmeans clustering.
% Determine what the best k is:
evaluationObject = evalclusters(xy, 'kmeans', 'DaviesBouldin', 'klist', [3:10])
% Do the kmeans with that k:
[assignedClass, clusterCenters] = kmeans(xy, evaluationObject.OptimalK);
clusterCenters % Echo to command window
% Do a scatter plot with the original class numbers assigned by kmeans.
hfig = figure;
subplot(1, 2, 1);
gscatter(x, y, assignedClass);
legend('FontSize', fontSize, 'Location', 'northwest');
grid on;
xlabel('x', 'fontSize', fontSize);
ylabel('y', 'fontSize', fontSize);
title('Original Class Numbers Assigned by kmeans()', 'fontSize', fontSize);
hfig.WindowState = 'maximized'; % Maximize the figure window so that it takes up the full screen.
%-------------------------------------------------------------------------------------------------------------------------------------------
% SORTING ALGORITHM
% Sort the clusters according to how far each cluster center is from the origin.
% First get the distance of each cluster center (as reported by the kmeans function) from the origin.
distancesFromOrigin = sqrt(clusterCenters(:, 1) .^ 2 + clusterCenters(:, 2) .^2)
%-------------------------------------------------------------------------------------------------------------------------------------------
% NOW GET NEW CLASS NUMBERS ACCORDING TO THAT SORTING ALGORITHM.
% Now, say for example, that you want to give the classes numbers according to how from from the origin they are.
% Determine what the new order to sort them in should be:
[sortedDistances, sortOrder] = sort(distancesFromOrigin, 'ascend') % Sort x values of centroids.
% Get new class numbers for each point since, for example,
% what used to be class 4 will now be class 1 since class 4 is closest to the origin.
% (The actual numbers may change for each run since kmeans is based on random initial sets.)
% Instantiate a vector that will tell each point what it's new class number will be.
newClassNumbers = zeros(length(x), 1);
% For each class, find out where it is
for k = 1 : size(clusterCenters, 1)
% First find out what points have this current class,
% and where they are by creating this logical vector.
currentClassLocations = assignedClass == k;
% Now assign all of those locations to their new class.
newClassNumber = find(k == sortOrder); % Find index in sortOrder where this class number appears.
fprintf('Initially the center of cluster %d is (%.2f, %.2f), %.2f from the origin.\n', ...
k, clusterCenters(k), clusterCenters(k), distancesFromOrigin(k));
fprintf(' Relabeling all points in initial cluster #%d to cluster #%d.\n', k, newClassNumber);
% Do the relabeling right here:
newClassNumbers(currentClassLocations) = newClassNumber;
end
% Plot the clusters with their new labels and colors.
subplot(1, 2, 2);
gscatter(x, y, newClassNumbers);
grid on;
xlabel('x', 'fontSize', fontSize);
ylabel('y', 'fontSize', fontSize);
title('New Class Numbers', 'fontSize', fontSize);
legend('FontSize', fontSize, 'Location', 'northwest');
% Basically, we're done now.
%-------------------------------------------------------------------------------------------------------------------------------------------
% DOUBLE CHECK, VERIFICATION, PROOF.
% To verify, let's get the mean (x,y) of each class after the relabeling.
fprintf('Now, after relabeling:\n');
for k = 1 : size(clusterCenters, 1)
% First find out what points have this class.
% and where they are by creating this logical vector.
currentClassLocations = newClassNumbers == k;
% Now assign all of those locations to their new class.
meanx(k) = mean(x(currentClassLocations));
meany(k) = mean(y(currentClassLocations));
fprintf('The center of cluster %d is (%.2f, %.2f).\n', k, meanx(k), meany(k));
end
% cc = [assignedClass, newClassNumbers]; % Class assignments, side-by-side.

KSSV on 18 Feb 2020
Edited: KSSV on 18 Feb 2020
If idx are the indices and P are the points you have.
figure
hold on
for i = 1:4
plot(P(idx==i,1),P(idx==i,2),'.') ;
end
legend({'1','2','3','4'})

Anastasis Pk on 18 Feb 2020
The problem is that the indices in kidx, are assigned in the wrong way. I want them to be assigned in an increasing centroid order.
KSSV on 18 Feb 2020
That is not a problem....get the centroids.....sort them and sort the indices accordingly.