how to find feature distribution in kmeans clustering

3 次查看(过去 30 天)
I am trying to to do kmeans clustering on the data available to me. The data consists of information for each student (56 students in total) and their features like scores for each subject, other metrics like performance parameter, etc. There are total 39 features for each student. So the data matrix is (56*39). I used kmeans clustering to group the students in two clusters. I have attached the result of the clustering in the figure below. The data is plotted along the principal components. I want to know how the features are distributed along these clusters ? Something like score1 is high (above certain value) in cluster1 and low in cluster2, score2 is low in cluster 1 and high in cluster2. Is there a way to know how the features are distributed in these two clusters ? I want to find features that contribute to each Kmeans cluster.
i have used idx = kmeans(X,k) function in Matlab

回答(1 个)

Image Analyst
Image Analyst 2022-2-10
编辑:Image Analyst 2022-2-10
You can call pca() to get the loadings and scores for each of the 39 different features for each PC. Like the first column represents PC1 and the 39 different values in the loadings vector represent the weights of the 39 different original feature values. You can also ask pca() for the amount of output variation explained by each of the original feature, like feature 1 (score) explains 60% of the variation, and feature 2 (performance metric 2, like days of class missed or whatever) explains 30% of the variation.
I'm not sure why you're doing kmeans on PCs in the first place. Seems weird to me. I mean all the PC's are supposed to be independent so plotting any of them vs the other would just look like a random shotgun blast, kind of like yours does. There is only very weak correlation, as expected. So why do clustering on them? If anything you'd do kmeans on the original data, not the principal components.
  6 个评论
Dhruvin Naik
Dhruvin Naik 2022-2-15
I did the PCA on the two clusters and got the principle components for both the clusters. Can you please tell me how should i compare the principle components from two clusters and map it to the original feature so that i can know if a given feature is more dominant in cluster one or cluster two ?
Image Analyst
Image Analyst 2022-2-15
The coefficients (first returned variable from pca()) give you that - they give you the relative weights of the original variables that are used when making the PC from the original variable values.

请先登录,再进行评论。

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by