How to reduce the number of unique values in a matrix?
1 次查看(过去 30 天)
显示 更早的评论
I would like to reduce the number of unique values in my matrix to a fixed number. If I just round my values, I still get a too high number of unique values. For instance, I would like to be able to group the matrix values into maybe 10 groups (=10 unique values). I would like the values of each group to relate to the original values, for instance as the mean of all the values in the group. My original idea was to do something like k-means clustering, but I don't think this can be done with data in a matrix.
Is there a way to do this?
0 个评论
采纳的回答
Stephen23
2017-4-27
编辑:Stephen23
2017-4-27
Although your data is arranged in a matrix, the matrix is a red-herring because actually you want a simple 1D clustering of the values themselves, irrelevant of their position in the matrix. This is simple, as K-Means clustering can be done on any number of dimensions, including on 1D data. So convert your matrix to a vector, apply kmeans, and the use the indices to allocate the values into the clusters. The simply reshape to get back the matrix shape.
Here is a complete working example, with just two clusters for clarity:
>> inp = [1,9,8,8;9,8,8,1;1,8,1,9;7,8,2,1]
inp =
1 9 8 8
9 8 8 1
1 8 1 9
7 8 2 1
>> [idx,vec] = kmeans(inp(:),2);
>> out = reshape(vec(idx),size(inp))
out =
1.1667 8.2000 8.2000 8.2000
8.2000 8.2000 8.2000 1.1667
1.1667 8.2000 1.1667 8.2000
8.2000 8.2000 1.1667 1.1667
更多回答(1 个)
Adam
2017-4-27
编辑:Adam
2017-4-27
vals = ceil( 10 * vals / max( vals(:) ) );
3 个评论
Adam
2017-4-27
Well, once you have your 10 unique labels you can use them as indices into the original values and replace the labels with the average of those values e.g.
newVals = ceil( 10 * vals / max( vals(:) ) );
for n = 1:10
newVals( newVals == n ) = mean( vals( newVals == n ) );
end
Stephen23
2017-4-27
编辑:Stephen23
2017-4-27
I also considered rounding as per Adam's answer, but this has the disadvantage that then the cluster values are linearly spaced, and this might not best represent the actual cluster values. Consider clusters centered around 0, 3, and 10: rounding would split the 3 cluster into 0 and 5... this might not be the desired effect.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Multidimensional Arrays 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!