Best read-only data strategy for parfor

3 次查看(过去 30 天)
Robin
Robin 2012-10-18
Hi,
I am using parfor on a grid with 60 workers.
I have some data which will be used read-only within the parfor loop.
I see that there are two options... load it on the machine I am submitting from so it is serialized and sent across the network (dedicated gigE for the cluster), or load it from disk within the loop.
Can anyone comment on which of these might be the best strategy for different data sizes? The data compresses very well so is about 20MB on disk but more than 1GB on in memory when loaded. What is the speed of loading and uncompressing in comparison to serialisation?
If I have it loaded on the submission machine, is matlab clever enough to serialize and send once to each worker or will it repeat it on every iteration. Obviously loading from a file would be done every iteration.
Any advice appreciated

回答(1 个)

Edric Ellis
Edric Ellis 2012-10-18
I would recommend trying my Worker Object Wrapper. It's designed for just this sort of situation. In your case, you should put the files in a location available to the workers, and have them load the data using something like this:
w = WorkerObjectWrapper( @loadHugeData );
The object 'w' is then effectively a handle to the data. When you pass this into a PARFOR loop, the workers can then access the underlying data, like so:
parfor ii = 1:N
doSomethingWith( w.Value );
end

类别

Help CenterFile Exchange 中查找有关 Parallel for-Loops (parfor) 的更多信息

标签

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by