Optimal leaf ordering for subset of leaf nodes

3 次查看(过去 30 天)
Hello,
I am trying to use the OptimalLeafOrder function for a large number of elements. This is expensive, but I only want to sort the order of a limited number of leaf nodes.
n = 10000;
X = rand(n,10);
eucD = pdist(X);
tree = linkage(eucD,'average');
tic; leafOrder = optimalleaforder(tree,eucD); toc
dendrogram(tree,50,'Reorder',leafOrder)
It appears that the evaluation time scales as ~O(n^3), which is what it should be according to the reference cited in the documentation [1].
This seems fair enough, but I was wondering if calculating the optimal order for 10,000 terminal leaf nodes is necessary and worthwhile, if I only want to plot a dendrogram with, say, 50 leaf nodes.
If I have understood correctly, it seems that all I would need is the distance matrix for the limited number of leaf nodes I want to include in the dendrogram. Is there a straightforward way to find this? Or to directly use a limited number of leaf nodes in the ordering algorithm?
thanks,
Ben
Ref: [1] Bar-Joseph, Z., Gifford, D.K., and Jaakkola, T.S. (2001). Fast optimal leaf ordering for hierarchical clustering. Bioinformatics 17, Suppl 1:S22–9. PMID: 11472989.

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Biological and Health Sciences 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by