How to interpret tocBytes results

2 次查看(过去 30 天)
I am currently running on R2019b, but when I attempt to start parpool using the 'threads' option, it tells me that 'threads' is not a valid option for parpool (the only option is 'local'). I noticed that 'threadPool' was introduced in 2020a, so I am guessing that perhaps this means that I need a later MATLAB release to use the 'threads' option with parpool.
According to the descision chart located Here - using threads-based local pool is advantageous if you are running on a single machine, and there is a large amount of data being transferred to each worker.
So I have 2 questions;
  1. What is preventing me from being able to use the 'threads' option (Do I need a MATLAB release later than R2019b?)
  2. How do I interpret the results from tocBytes to determine if 'threads' will be a benefit to me?
I am running on a system with Dual Xeon Gold 6148 CPUs @ 2.4 Ghz, 256 Gb, 20 cores (each) Total of 40 cores.
Using 36 workers, tocBytes shows the following data transfer to the workers:
Min: 35 Mb, Max: 506 Mb, Mean: 153 Mb, Median: 95.4 Mb
Total (all 36 workers) is 5.5 Gb
So, are these numbers considered "large", and would I expect to see a benefit in using threads-based processing.

采纳的回答

Walter Roberson
Walter Roberson 2023-7-21
You need R2020a or later to use parpool("threads")
However if you are using R2021b or later, it is recommended that you use backgroundPool
Unfortunately, ticBytes() and tocBytes() do not work in parpool("threads") or backgroundPool so it is not possible to use those tools to compare the data transfer.
My understanding is that for ordinary numeric classes, that shared pointers are used for the different threads, but that if copy-on-write is needed that the newly allocated memory is on a per-thread memory pool (so that it can be easily released when the parfeval() finishes) . However, I have not yet been able to come up with a consistent internal description of how threads work that would lead to the same limitations as threads have -- the architectures I have come up with mentally would have fewer limitations than thread pools have in practice. Either that or the architectures I come up with might block all handle objects... I haven't figured out yet what Mathworks is doing that would allow some handle objects to work with thread-shared memory but would still require the same limitations that are seen in practice.
  6 个评论
Walter Roberson
Walter Roberson 2023-7-21
编辑:Walter Roberson 2023-7-21
Suppose that you were able to eliminate 100% of the 5.5 gigabytes. That would reduce your computation time by
format long g
bytes_to_transfer = 5.5 * 10^9;
max_bandwidth_bytes_per_second = 119.21 * 2^30
max_bandwidth_bytes_per_second =
128000762839.04
seconds_to_transfer = bytes_to_transfer / max_bandwidth_bytes_per_second
seconds_to_transfer =
0.0429684939215262
Which would be roughly 1/23 of a second. Which is less than the measurment error of "44 minutes"
Jim Riggs
Jim Riggs 2023-7-22
Thank you for the analysis. This is very helpful.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by