Tall vs distributed array
17 次查看(过去 30 天)
显示 更早的评论
I see that we have tall and distributed arrays.
Tall divides data into chunks.
Distributed also divides data into chunks!
What's the differece here?
And, how either of these are connected to parallel computing?
0 个评论
回答(1 个)
Edric Ellis
2018-5-14
Both tall and distributed arrays are designed for processing large amounts of data, but they have somewhat different capabilities.
distributed arrays exist spread across the memory of several MATLAB worker processes - so the largest distributed array you can create is limited by the total amount of physical memory you have. Also, distributed arrays are more oriented towards dense and sparse linear algebra. distributed arrays require Parallel Computing Toolbox, and are most effective when used with MATLAB Distributed Computing Server (which allows the use of multiple machines across which to distribute the data).
The data for tall arrays exists on disk, and so their size is not limited by the amount of memory you have available. However, as the name implies, tall arrays can be large only in the first dimension. tall arrays are more geared towards data analytics. tall arrays ship with MATLAB itself, but there is enhanced support in both Parallel Computing Toolbox (which enables parallel processing in a single computer) and MATLAB Distributed Computing Server (which enables parallel processing across a cluster, including Hadoop/Spark clusters).
3 个评论
Edric Ellis
2018-5-15
The fundamental difference is where the data is held once you've created the array. distributed arrays are more restricted in size because the contents are always in memory, but they are more capable. tall arrays can be much larger - as long as you have the disk space.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Creating and Concatenating Matrices 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
