Making use of multiple harddrives to avoid IO bottlenecks?

2 次查看(过去 30 天)
I am reading in a lot of data (1.5 terabyte). So I would like to minimize disk IO.
  • I have 4 NVME drives (2 tb each)
  • 'a lot' of ram (okay, a lot = 128 gb, which could mean not that much in fact)
  • I have data that I would like to postprocess in matlab
  • I am using parfor loops to read data
Typically, I would put all the data on 1 drive. Even though NVME drives IO is quite quick (~4000 mb/s), my question is:
  • Would it make sense to distribute the (to be postprocessed data) on all 4 drives, which would then be read in by matlab, in order to minimize IO bottlenecks?

采纳的回答

Walter Roberson
Walter Roberson 2022-6-20
You should ideally distribute the data to different drives and distribute the drives to different controllers.
However you might be constrained by your architecture. I seem to recall having read about some architectures that could only handle three full-width PCIx and the fourth one had to run at half speed. You also need to take into account that the other drives on your system will need some lanes. PCIx cannot allocate (for example) 12 lanes for one device, and 2 for each of two other devices for a total of 16: if I recall correctly, you can only allocate powers of 2 - so the first device could get 8, and the other 2 each, with the remaining 4 unused.
You might be interested in some of the Linustech videos, as in some of them he shows difficulty in maxing out drives.
The reviews seem to say that in the mass pro market these days (not very low volume specialty manufacturers), the Samsung 9x0 are close to the best read rates (not always the best write rates compared some of the small manufacturers).
While I am on the topic: anyone using external enclosures and needing high performance, should look seriously at some of Thunderbolt 4 NAS or DAS. The performance ratings for the well designed enclosures are sometimes several times what you would get from the low cost mass market drives.
  3 个评论
Walter Roberson
Walter Roberson 2022-6-21
If the cluster is cloud computing that is emulating drives over some internal layer, then that is probably something that would require getting a specific service agreement for separate hardware.
If the cluster can give you multiple drives each on separate controllers, you would typically prefer that. If you are using spinning platter drives, then two drives per controller is commonly the most efficient.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 MATLAB 的更多信息

标签

产品


版本

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by