Calling parpool with SpmdEnabled = False

9 次查看(过去 30 天)
Hello,
I am having an issue with my University cluster. It is that when one worker crash, for a reason like "communication with the worker is lost probably due to a network problem", then all the remaining workers crash, and it may happen again and again!
Because I am using parfor and it doesn't require communication among the workers like spmd, then I would like all the other workers to finish their job!
I learnt online that you can solve this issue by calling your HPC cluster with the parameter "SpmdEnabled" equals False. What I did but MATLAB ignored my request as you can see in the Warning message at the end.
My question is, how can I solve this issue?
---------------------------------------------------------------------------------------------------------
parpool('HPCServerProfile1', 160, 'SpmdEnabled', false)
Starting parallel pool (parpool) using the 'HPCServerProfile1' profile ...
Warning: Disabling SPMD on parallel pools is not supported on this cluster type. Set 'SpmdEnabled' value to true.
Connected to the parallel pool (number of workers: 160).

采纳的回答

Edric Ellis
Edric Ellis 2020-1-30
Unfortunately, only MJS and Local cluster types support SpmdEnabled = false. You might be able to use the "cluster parfor" approach though - see the documentation. Basically, you would transform your main parfor loop like so:
% Important: do *not* create a parallel pool prior to running this!
% In fact, you may wish to call "delete(gcp('nocreate'))" to be sure.
% Get the cluster object
c = parcluster('HPCServerProfile1');
% Create the parforOptions structure
opts = parforOptions(c);
% Run the parfor loop directly on the cluster with no parallel pool
parfor (idx = 1:10000, opts)
% Body of the parfor loop
end
This should be more robust, however it does incur more overhead than the interactive parpool approach.

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by