How can I reassign clusters based on similarity or any other method?
23 个评论
Hi @ Med Future,
Can you share your code on this form?
Also, please elaborate when you mentioned,
- I have already tried the K means clustering but it does not provide a results*
Hi @Med Future ,
I have modified your code shared on the form and it is capable of reassigning clusters based on similarity.
% Define cell1 and cell2
cell1 = [1, 2, 3; 4, 5, 6]; % Example data for cell1
cell2 = [7, 8, 9; 10, 11, 12]; % Example data for cell2
% Normalize the rows of the cells for cosine similarity
cell1_norm = cell1 ./ sqrt(sum(cell1.^2, 2));
cell2_norm = cell2 ./ sqrt(sum(cell2.^2, 2));
% Compute the cosine similarity matrix
similarity_matrix = cell1_norm * cell2_norm';
% Average similarity score
similarity_score = mean(similarity_matrix(:));
% Display the similarity score
fprintf('Average Cosine Similarity Score: %f\n', similarity_score);
% Define the threshold for similarity to reassign clusters
similarity_threshold = 0.9;
if similarity_score > similarity_threshold
% Combine the data from both cells
combinedData = [cell1; cell2];
% Apply K-means clustering
k = 2; % Define the number of clusters 'k'
[idx, C] = kmeans(combinedData, k);
% Calculate centroid distances for cluster reassignment
centroid_distances = pdist(C); % Calculate pairwise distances between centroids
avg_distance = mean(centroid_distances); % Calculate the average centroid distance
% Reassign clusters if centroid distances exceed a certain threshold
centroid_threshold = 5; % Define a threshold for centroid distances
if avg_distance > centroid_threshold
% Calculate the pairwise distances between data points and centroids distances = pdist2(combinedData, C);
% Find the minimum distance for each data point
[~, min_indices] = min(distances, [], 2);
% Update the cluster assignments in 'idx' based on the minimum distances
idx = min_indices;
end
% Iterate over the clusters and check for different features
unique_clusters = unique(idx); % Get the unique cluster labels
num_clusters = numel(unique_clusters); % Get the number of clusters
for i = 1:num_clusters
cluster_data = combinedData(idx == unique_clusters(i), :); % Get the data points for the current cluster
% Check for different features within the cluster
if any(range(cluster_data) > 1)
% Split the cluster into subclusters with similar features
subclusters = kmeans(cluster_data, 2);
% Update the cluster assignments in 'idx' for the subclusters
idx(idx == unique_clusters(i)) = subclusters + max(idx);
end
end
% Merge clusters with similar features
unique_clusters = unique(idx); % Get the updated unique cluster labels
num_clusters = numel(unique_clusters); % Get the updated number of clusters
for i = 1:num_clusters
cluster_data = combinedData(idx == unique_clusters(i), :); % Get the data points for the current cluster
% Check for similar features with other clusters
for j = i+1:num_clusters
other_cluster_data = combinedData(idx == unique_clusters(j), :); % Get the data points for the other cluster
% Check for similar features using a threshold
if max(pdist2(cluster_data, other_cluster_data)) < 1
% Merge the clusters into a single cluster
idx(idx == unique_clusters(j)) = unique_clusters(i);
end
end
end
% Display the updated clustering results
figure;
gscatter(combinedData(:,1), combinedData(:,2), idx);
title('Modified Clustering Results');
% Save the modified clustering results
save('modified_clustered_data.mat', 'idx', 'combinedData');
else
fprintf('Similarity score is less than %f, not reassigning clusters.\n', similarity_threshold);
end
I will go through the code step by step to let you understand how it achieves this. First, the code defines two cells, cell1 and cell2, which contain example data for clustering. These cells represent the clusters that need to be reassigned based on similarity.
cell1 = [1, 2, 3; 4, 5, 6]; % Example data for cell1
cell2 = [7, 8, 9; 10, 11, 12]; % Example data for cell2
Next, the code normalizes the rows of the cells using the cosine similarity measure. This normalization step ensures that the similarity between clusters is calculated accurately.
cell1_norm = cell1 ./ sqrt(sum(cell1.^2, 2));
cell2_norm = cell2 ./ sqrt(sum(cell2.^2, 2));
After normalizing the cells, the code computes the cosine similarity matrix between cell1_norm and cell2_norm. The similarity matrix represents the pairwise similarity between each data point in cell1 and cell2.
similarity_matrix = cell1_norm * cell2_norm';
To determine the average similarity score between the clusters, the code calculates the mean of all elements in the similarity matrix.
similarity_score = mean(similarity_matrix(:));
The code then displays the average cosine similarity score.
fprintf('Average Cosine Similarity Score: %f\n', similarity_score);
Next, the code defines a similarity threshold. If the similarity score is greater than the threshold, the clusters will be reassigned based on similarity.
similarity_threshold = 0.9;
The code checks if the similarity score exceeds the threshold. If it does, the clusters will be reassigned.
if similarity_score > similarity_threshold
% Combine the data from both cells
combinedData = [cell1; cell2];
% Apply K-means clustering
k = 2; % Define the number of clusters 'k'
[idx, C] = kmeans(combinedData, k);
The code then calculates the centroid distances between the clusters. If the average centroid distance exceeds a certain threshold, the clusters will be reassigned.
centroid_distances = pdist(C); % Calculate pairwise distances between centroids
avg_distance = mean(centroid_distances); % Calculate the average centroid distance
% Reassign clusters if centroid distances exceed a certain threshold
centroid_threshold = 5; % Define a threshold for centroid distances
if avg_distance > centroid_threshold
% Calculate the pairwise distances between data points and centroids
distances = pdist2(combinedData, C);
% Find the minimum distance for each data point
[~, min_indices] = min(distances, [], 2);
% Update the cluster assignments in 'idx' based on the minimum distances
idx = min_indices;
end
The code then iterates over the clusters and checks for different features within each cluster. If a cluster has different features, it will be split into subclusters with similar features.
unique_clusters = unique(idx); % Get the unique cluster labels
num_clusters = numel(unique_clusters); % Get the number of clusters
for i = 1:num_clusters
cluster_data = combinedData(idx == unique_clusters(i), :); % Get the data points for the current cluster
% Check for different features within the cluster
if any(range(cluster_data) > 1)
% Split the cluster into subclusters with similar features
subclusters = kmeans(cluster_data, 2);
% Update the cluster assignments in 'idx' for the subclusters
idx(idx == unique_clusters(i)) = subclusters + max(idx);
end
end
After splitting clusters with different features, the code merges clusters with similar features. It iterates over the clusters and compares their features using a threshold. If the features are similar, the clusters will be merged into a single cluster.
unique_clusters = unique(idx); % Get the updated unique cluster labels
num_clusters = numel(unique_clusters); % Get the updated number of clusters
for i = 1:num_clusters
cluster_data = combinedData(idx == unique_clusters(i), :); % Get the data points for the current cluster
% Check for similar features with other clusters
for j = i+1:num_clusters
other_cluster_data = combinedData(idx == unique_clusters(j), :); % Get the data points for the other cluster
% Check for similar features using a threshold
if max(pdist2(cluster_data, other_cluster_data)) < 1
% Merge the clusters into a single cluster
idx(idx == unique_clusters(j)) = unique_clusters(i);
end
end
end
Finally, the code displays the updated clustering results by plotting the data points with their assigned clusters.
% Display the updated clustering results
figure;
gscatter(combinedData(:,1), combinedData(:,2), idx);
title('Modified Clustering Results');
% Save the modified clustering results
save('modified_clustered_data.mat', 'idx', 'combinedData');
else
fprintf('Similarity score is less than %f, not reassigning clusters.\n', similarity_threshold);
end
In nutshell, this modified code is capable of reassigning clusters based on similarity. It combines clusters with the same features, splits clusters with different features, and merges clusters with similar features. The code utilizes the K-means clustering algorithm and cosine similarity to achieve this. Please see attached plot along with test results.
Hope, this answers your question.
回答(1 个)
19 个评论
另请参阅
标签
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!发生错误
由于页面发生更改,无法完成操作。请重新加载页面以查看其更新后的状态。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
亚太
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)