problem with hierarchical clustering
2 次查看(过去 30 天)
显示 更早的评论
Hi guys i need your help to solve this problem. I need to make a Hierarchical Clustering for university purposes. I write the code:
a = [.2 0 0 .2 .6 0; 0 0 0 0 1 0; 0 .11 0 .11 .22 .56; 0 0 .25 0 0 .75; .1 0 0 .3 .5 .1; .06 .16 .08 .12 .04 .53]; b = [0 0 0 .2 .4 .4; 0 .24 .06 .06 .29 .35; 0 0 0 0 0 0; .33 .33 0 0 0 .33; .05 0 .09 .27 .45 .14; .21 .11 0 .11 .24 .34]; c = [.3 .1 0 .1 0 .5; 0 0 0 0 0 0; 0 0 0 0 0 0; .21 .12 .18 .24 .06 .18; 0 0 0 0 0 0; .29 .14 .04 .25 .11 .18]; d = [.18 .18 .09 .18 .09 .27; 0 0 .17 .33 0 .5; 0 0 .29 .29 0 .43; 0 .18 .09 .09 0 .64; 0 0 0 0 0 0; .15 .14 .05 .18 .01 .47]; e = [ 0 0 0 0 0 0; .18 .18 .27 .18 .18 0; 0 0 0 0 0 0; 0 0 0 0 0 0; .11 .2 .17 .03 .2 .29; .2 0 0 0 .1 .7]; f = [ 0 0 0 0 0 0; 0 .43 0 .14 .14 .29; 0 0 0 0 0 0; 0 0 0 1 0 0; 0 0 0 0 0 0; .18 .09 .05 .14 .14 .41]; g = [ .21 .07 0 .21 .21 .29; 0 0 .67 0 0 .33; .05 .11 .37 .21 .11 .16; .14 .09 .05 .23 .23 .27; .13 .11 .18 .11 .13 .33; 0 0 .25 .25 .50 .0]; h = [0 0 0 0 0 0; .15 .2 .15 .2 .15 .15; .11 .1 .11 0 0 .67; 0 0 0 0 0 1; .11 .11 .22 .11 .11 .33; .27 .18 .14 .14 .14 .14]; i = [.1 0 .1 .1 .4 .3; 0 0 0 0 0 0; .12 .04 .08 .08 .42 .25; 0 0 0 0 0 0; .32 .09 .12 .03 .27 .17; .17 0 .02 .13 .43 .26]; l = [.29 0 0 .07 0 .64; 0 0 0 0 0 0; 0 0 0 0 0 0; 0 0 0 0 0 1; 0 0 0 1 0 0; .31 .04 .00 .02 .13 .51]; m = [0 .5 0 0 0 .5; 0 0 0 0 0 1; 0 0 0 0 0 1; .09 .18 0 0 .09 .64; 0 0 0 0 0 1; .08 .06 .01 .13 .05 .68]; n = [.26 .09 .00 .04 .17 .43; 0 0 0 0 0 0; 0 0 0 0 0 0; .29 0 0 0 .29 .43; .17 0 0 .17 .17 .5; .20 .04 .02 .1 .24 .41]; o = [0 .25 0 .25 0 .5; 0 0 0 0 0 0;0 0 0 0 0 0; .27 0 0 .14 .09 .5; 0 0 0 0 0 0; .41 .05 .05 .19 .08 .22]; p = [.25 0 0 .17 .08 .5; 0 0 0 0 0 0;0 0 0 0 0 0;1 0 0 .33 0 .57; 0 0 0 0 0 0; .06 .06 00 .25 .00 .62]; q = [.29 .24 0 .1 .14 .24; 1 0 0 0 0 0; 0 0 0 0 0 0; 0 0 0 0 0 0; .33 .04 0 .06 .43 .14; 0 0 0 0 .33 .67]; r = [.22 .04 .09 .04 .35 .26; 0 0 0 0 0 0; .25 .05 .05 .05 .25 .35; 0 0 .14 .14 .33 .33; 0 0 .08 .15 .23 .54; .17 .06 .17 .11 .17 .33]; s = [.37 .21 00 .16 00 .26; 0 .5 0 0 0 .5; 0 0 0 0 0 0; 0 0 0 0 0 0; .45 0 0 .23 .05 .27; .05 .05 .05 0 .18 .68]; t = [ 0 0 0 .67 .17 .17; 0 0 0 0 0 0; 0 0 0 0 0 0;0 0 0 0 0 0; .20 0 0 .2 0 .6; .22 .02 .04 .07 0 .64]; u = [ .08 00 .08 .25 0 .58; 0 0 0 0 0 0; 0 0 0 0 0 0; .08 0 .08 .33 0 .5; 0 0 0 0 0 0; .13 .03 .06 .1 0 .67]; v = [.11 0 .11 0 .33 .44; 0 0 0 0 0 0; 0 0 0 0 0 1; 0 0 0 0 0 0; 0 0 .07 .07 .22 .63; .09 .04 .09 .07 .20 .52];
all_data = [a;b;c;d;e;f;g;h;i;l;m;n;o;p;q;r;s;t;u;v]; size(all_data) Y = pdist(all_data); Z = linkage(Y,'average'); dendrogram(Z);
I wrote twenty matrices, why diagram (dendrogram) returns me over twenty nodes? where am I wrong?
sorry for my bad english.
2 个评论
Brendan Hamm
2016-9-30
What happens on the first iteration is that 2 points are combined into a new node and then distance is measured from this node to others.
回答(1 个)
Kelly Kearney
2016-9-30
You concatenated 20 6 x 6 matrices. This results in 120 rows in your matrix, hence 120 leaves in your dendrogram (assuming you actually show all the leaves... by default, dendrogram truncates the number of displayed leaves to 30).
2 个评论
Kelly Kearney
2016-10-3
Sure, if that makes sense for your application. You'll just need to reshape the matrix so all 36 property variables are in a single row for each leaf.
I would start by changing your variable naming system. The hard-coded alphabetical names are just asking for trouble. A 3D matrix or cell array will let you store the same data with much easier indexing.
So instead of
a = [.2 0 0 .2 .6 0; 0 0 0 0 1 0; 0 .11 0 .11 .22 .56; 0 0 .25 0 0 .75; .1 0 0 .3 .5 .1; .06 .16 .08 .12 .04 .53];
b = [0 0 0 .2 .4 .4; 0 .24 .06 .06 .29 .35; 0 0 0 0 0 0; .33 .33 0 0 0 .33; .05 0 .09 .27 .45 .14; .21 .11 0 .11 .24 .34];
c = [.3 .1 0 .1 0 .5; 0 0 0 0 0 0; 0 0 0 0 0 0; .21 .12 .18 .24 .06 .18; 0 0 0 0 0 0; .29 .14 .04 .25 .11 .18];
d = [.18 .18 .09 .18 .09 .27; 0 0 .17 .33 0 .5; 0 0 .29 .29 0 .43; 0 .18 .09 .09 0 .64; 0 0 0 0 0 0; .15 .14 .05 .18 .01 .47];
try this:
ndata = 4;
var = nan(6,6,ndata);
var(:,:,1) = [.2 0 0 .2 .6 0; 0 0 0 0 1 0; 0 .11 0 .11 .22 .56; 0 0 .25 0 0 .75; .1 0 0 .3 .5 .1; .06 .16 .08 .12 .04 .53];
var(:,:,2) = [0 0 0 .2 .4 .4; 0 .24 .06 .06 .29 .35; 0 0 0 0 0 0; .33 .33 0 0 0 .33; .05 0 .09 .27 .45 .14; .21 .11 0 .11 .24 .34];
var(:,:,3) = [.3 .1 0 .1 0 .5; 0 0 0 0 0 0; 0 0 0 0 0 0; .21 .12 .18 .24 .06 .18; 0 0 0 0 0 0; .29 .14 .04 .25 .11 .18];
var(:,:,4) = [.18 .18 .09 .18 .09 .27; 0 0 .17 .33 0 .5; 0 0 .29 .29 0 .43; 0 .18 .09 .09 0 .64; 0 0 0 0 0 0; .15 .14 .05 .18 .01 .47];
(I'm only demonstrating with 4 variables for now, but you should list all 20).
Now you can easily reshape that array into the nleaf x nproperty array needed by pdist (20 x 36, in your example):
Y = pdist(reshape(var,[],ndata)');
Z = linkage(Y,'average');
dendrogram(Z, 0);
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!