Why does using graphminspantree() result in large memory use
显示 更早的评论
I'm attempting to generate a minimum spanning tree with graphminspantree(). Input is a complete graph, i.e. a distance matrix. My full dataset has ~210.000 rows/columns, but so far I was unable to produce any usable result besides small examples (a few hundred/thousand rows/columns) as memory consumption is enormous. I have access to a machine with 768GB of RAM (not a typo), here graphminspantree() spent about 20 minutes accumulating RAM before I had to terminate MATLAB with 95% memory use when it began swapping. The input was a subset of 80k rows of my full 212k rows of data.
Some benchmarks:
X = load('mydatafile.csv');
D = pdist(X(1:rows,:));
tic; [t,p] = graphminspantree(sparse(squareform(D))); toc
1600 rows 2s <1GB
3200 rows 9s ~2GB
6400 rows 36s ~8GB
12800 rows 160s ~34GB
25600 rows 727s ~107GB
extrapolation:
212000 rows ~13h >>2TB
While I could tolerate ~13 hours of runtime, multiple TB of RAM is a bit much to ask for. Are there any alternative/more efficient ways to generate a minimum spanning tree in MATLAB?
For reference, a quick comparison with an implementation in R indicated no runaway memory use, but an extrapolated runtime of about 4 years for the full data set. Also not exactly amazing.
采纳的回答
更多回答(0 个)
类别
在 帮助中心 和 File Exchange 中查找有关 Bioinformatics Toolbox 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!