13 views (last 30 days)

Hello,

so I run K-Means algorithm in a data set and it can calculate that there are 4 different clusters, but the numbers are wrong. To be more specific, I would like it to assign the values in an increasing order.

E = evalclusters(c,'kmeans','DaviesBouldin','klist',[3:10])

kidx = kmeans(c,E.OptimalK);

Image Analyst
on 19 Feb 2020

Anastasis, below is a full demo of how to sort the labels according to how far the cluster centroid is from the origin, and how to relabel the class numbers so that class 1 will be closest and class 4 will be farthest away from the origin.

Don't be afraid of the length of the code. It's actually simple but it just looks long because I had to put in code to make some sample clustered data (which you won't need), and has code at the end to double-check/verify the results (which you won't need), as well as tons of comments to help explain it to you (which you should probably leave in).

Adapt as needed.

% Demo to show how you can redefine the class numbers assigned by kmeans() to different numbers.

% In this demo, the original, arbitrary class numbers will be reassigned a new number

% according to how far the cluster centroid is from the origin.

% Author Image Analyst, Feb. 2020.

clc; % Clear the command window.

close all; % Close all figures (except those of imtool.)

clearvars;

workspace; % Make sure the workspace panel is showing.

format long g;

format compact;

fontSize = 18;

%-------------------------------------------------------------------------------------------------------------------------------------------

% CREATE SAMPLE DATA.

% Make up 4 clusters with 150 points each.

pointsPerCluster = 150;

spread = 0.03;

offsets = [0.3, 0.5, 0.7, 0.9];

% offsets = [0.62, 0.73, 0.84, 0.95];

xa = spread * randn(pointsPerCluster, 1) + offsets(1);

ya = spread * randn(pointsPerCluster, 1) + offsets(1);

xb = spread * randn(pointsPerCluster, 1) + offsets(2);

yb = spread * randn(pointsPerCluster, 1) + offsets(2);

xc = spread * randn(pointsPerCluster, 1) + offsets(3);

yc = spread * randn(pointsPerCluster, 1) + offsets(3);

xd = spread * randn(pointsPerCluster, 1) + offsets(4);

yd = spread * randn(pointsPerCluster, 1) + offsets(4);

x = [xa; xb; xc; xd];

y = [ya; yb; yc; yd];

xy = [x, y];

%-------------------------------------------------------------------------------------------------------------------------------------------

% K-MEANS CLUSTERING.

% Now do kmeans clustering.

% Determine what the best k is:

evaluationObject = evalclusters(xy, 'kmeans', 'DaviesBouldin', 'klist', [3:10])

% Do the kmeans with that k:

[assignedClass, clusterCenters] = kmeans(xy, evaluationObject.OptimalK);

clusterCenters % Echo to command window

% Do a scatter plot with the original class numbers assigned by kmeans.

hfig = figure;

subplot(1, 2, 1);

gscatter(x, y, assignedClass);

legend('FontSize', fontSize, 'Location', 'northwest');

grid on;

xlabel('x', 'fontSize', fontSize);

ylabel('y', 'fontSize', fontSize);

title('Original Class Numbers Assigned by kmeans()', 'fontSize', fontSize);

hfig.WindowState = 'maximized'; % Maximize the figure window so that it takes up the full screen.

%-------------------------------------------------------------------------------------------------------------------------------------------

% SORTING ALGORITHM

% Sort the clusters according to how far each cluster center is from the origin.

% First get the distance of each cluster center (as reported by the kmeans function) from the origin.

distancesFromOrigin = sqrt(clusterCenters(:, 1) .^ 2 + clusterCenters(:, 2) .^2)

% NOW GET NEW CLASS NUMBERS ACCORDING TO THAT SORTING ALGORITHM.

% Now, say for example, that you want to give the classes numbers according to how from from the origin they are.

% Determine what the new order to sort them in should be:

[sortedDistances, sortOrder] = sort(distancesFromOrigin, 'ascend') % Sort x values of centroids.

% Get new class numbers for each point since, for example,

% what used to be class 4 will now be class 1 since class 4 is closest to the origin.

% (The actual numbers may change for each run since kmeans is based on random initial sets.)

% Instantiate a vector that will tell each point what it's new class number will be.

newClassNumbers = zeros(length(x), 1);

% For each class, find out where it is

for k = 1 : size(clusterCenters, 1)

% First find out what points have this current class,

% and where they are by creating this logical vector.

currentClassLocations = assignedClass == k;

% Now assign all of those locations to their new class.

newClassNumber = find(k == sortOrder); % Find index in sortOrder where this class number appears.

fprintf('Initially the center of cluster %d is (%.2f, %.2f), %.2f from the origin.\n', ...

k, clusterCenters(k), clusterCenters(k), distancesFromOrigin(k));

fprintf(' Relabeling all points in initial cluster #%d to cluster #%d.\n', k, newClassNumber);

% Do the relabeling right here:

newClassNumbers(currentClassLocations) = newClassNumber;

end

% Plot the clusters with their new labels and colors.

subplot(1, 2, 2);

gscatter(x, y, newClassNumbers);

grid on;

xlabel('x', 'fontSize', fontSize);

ylabel('y', 'fontSize', fontSize);

title('New Class Numbers', 'fontSize', fontSize);

legend('FontSize', fontSize, 'Location', 'northwest');

% Basically, we're done now.

% DOUBLE CHECK, VERIFICATION, PROOF.

% To verify, let's get the mean (x,y) of each class after the relabeling.

fprintf('Now, after relabeling:\n');

for k = 1 : size(clusterCenters, 1)

% First find out what points have this class.

% and where they are by creating this logical vector.

currentClassLocations = newClassNumbers == k;

% Now assign all of those locations to their new class.

meanx(k) = mean(x(currentClassLocations));

meany(k) = mean(y(currentClassLocations));

fprintf('The center of cluster %d is (%.2f, %.2f).\n', k, meanx(k), meany(k));

end

% cc = [assignedClass, newClassNumbers]; % Class assignments, side-by-side.

Sign in to comment.

KSSV
on 18 Feb 2020

Edited: KSSV
on 18 Feb 2020

If idx are the indices and P are the points you have.

figure

hold on

for i = 1:4

plot(P(idx==i,1),P(idx==i,2),'.') ;

end

legend({'1','2','3','4'})

Sign in to answer this question.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.