Clustering GMM quadratic Matrix
1 次查看(过去 30 天)
显示 更早的评论
Dear All,
my Code is not working how I want it. Clustering GMM seems not as easy as I thought. Perhabs you can help.
I have a 100x100 matrix with which I want to cluster the data points that a 100x100 represents. I want to use the soft GMM algorithm for this, but I don't know exactly how many clusters will come out in the end. What I can say, however, is that there is a certain relationship between the factors (x and y 2D Matrix). The entries in our matrix are not binary, so the input matrix is square. The result should be the number of clusters and which elements are mapped in this cluster. I would like to have this representation graphically derived. Furthermore, I would like the code to try all possible values for k (i.e. the number of clusters) once and then give me the best result. In other words, the result where the number of clusters is optimal.
Ideally, Matlab marks all data points that belong together in a coloured circle at the end of the plot.
My Code looks like that.
% Erstellen einer 100x100 Matrix mit Zufallszahlen
matrix_size = 100;
data_matrix = rand(matrix_size);
% Verschiedene Anzahlen von Clustern ausprobieren
min_clusters = 1;
max_clusters = 20;
AIC = zeros(1, max_clusters);
BIC = zeros(1, max_clusters);
gmds = cell(1, max_clusters);
options = statset('MaxIter', 100); % Maximale Iterationen für das Clustering
%Neu
for k = min_clusters:max_clusters
gmm = fitgmdist(data_matrix(:), k, 'Options', options);
gmds{k - min_clusters + 1} = gmm; % Änderung des Index für gmds
AIC(k) = gmm.AIC;
BIC(k) = gmm.BIC;
end
% Wählen Sie die Anzahl der Cluster basierend auf AIC oder BIC
[~, num_clusters_AIC] = min(AIC);
[~, num_clusters_BIC] = min(BIC);
disp(['Anzahl der Cluster basierend auf AIC: ', num2str(num_clusters_AIC)]);
disp(['Anzahl der Cluster basierend auf BIC: ', num2str(num_clusters_BIC)]);
% Wählen Sie die Anzahl der Cluster basierend auf einem der Kriterien
num_clusters = num_clusters_AIC; % oder num_clusters_BIC
% GMM mit der ausgewählten Anzahl von Clustern durchführen
gmm = gmds{num_clusters};
% Cluster-Zuweisungen erhalten
cluster_idx = cluster(gmm, data_matrix(:));
disp('mean points are at:');
disp(gmm.mu)
disp('covariances are:');
disp(gmm.Sigma)
disp('Components Proportions are:');
disp(gmm.ComponentProportion)
%% plot the results
x1 = linspace(min(X(:,1))-2, max(X(:,1))+2, 500);
x2 = linspace(min(X(:,2))-2, max(X(:,2))+2, 500);
[x1grid,x2grid] = meshgrid(x1,x2);
X0 = [x1grid(:) x2grid(:)];
mahalDist = mahal(gmfit,X0);
figure;
h1=gscatter(X(:,1),X(:,2),clusterind);
hold on
plot(gmfit.mu(:,1),gmfit.mu(:,2),'kx','LineWidth',2,'MarkerSize',10)
threshold = sqrt(chi2inv(0.99,2));
for m = 1:k
idx = mahalDist(:,m)<=threshold;
Color = h1(m).Color;
plot(X0(idx,1),X0(idx,2),'.','Color',Color,'MarkerSize',1);
end
legend off;
title('GMM fitted')
回答(1 个)
Aman
2024-3-15
Hi Alexander,
As per my understanding, you want to find the ideal number of clusters and cluster the data that you have using the GMM (Gaussian Mixture Model).
The code that you have shared uses AIC and BIC matrices for finding out the ideal number of clusters, and then Mahalanobis distance for finding the distance of each point to the cluster center. The plotting part of the code is incorrect as it considers only the data matrix to have two features, which is incorrect as the data matrix has a hundred features.
Since you want to derive the number of clusters through graphical inference, it would be better to use the elbow curve using AIC and BIC matrices and then find the elbow in the curve to find the optimal number of clusters. You can refer to the below code, which does the same.
% Erstellen einer 100x100 Matrix mit Zufallszahlen
matrix_size = 100;
data_matrix = rand(matrix_size);
% Verschiedene Anzahlen von Clustern ausprobieren
min_clusters = 1;
max_clusters = 20;
AIC = zeros(1, max_clusters);
BIC = zeros(1, max_clusters);
gmds = cell(1, max_clusters);
options = statset('MaxIter', 100); % Maximale Iterationen für das Clustering
for k = min_clusters:max_clusters
gmm = fitgmdist(data_matrix(:), k, 'Options', options); % Notice data_matrix is directly used
gmds{k - min_clusters + 1} = gmm; % Änderung des Index für gmds
AIC(k) = gmm.AIC;
BIC(k) = gmm.BIC;
end
% Plotting the elbow curve for AIC
figure;
plot(min_clusters:max_clusters, AIC, '-o');
xlabel('Number of clusters (k)');
ylabel('AIC');
title('Elbow Curve using AIC');
% Optionally, also plot the elbow curve for BIC in a new figure
figure;
plot(min_clusters:max_clusters, BIC, '-o');
xlabel('Number of clusters (k)');
ylabel('BIC');
title('Elbow Curve using BIC');
I hope this helps!
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Graph and Network Algorithms 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!