parpool() stalls on Xeon Phi x200 with >50 workers
1 次查看(过去 30 天)
显示 更早的评论
I am evaluating parpool() on my new Intel Xeon Phi "Knights Landing" 7210. I find that parpool('local',NumWorkers) successfully creates a pool for NumWorkers<51, but it stalls and fails for any number equal to or greater than 51.
My system: 64 physical cores | 265 logical cores | 6x16GB memory | OS = CentOS linux | Matlab version R2018a
Attempted solutions: (1) changed java heap size between 512MB and 8192MB; (2) set java ThreadStackSize via $MATLAB/bin/glnxa64/java.opts (tried -XX:ThreadStackSize=8192 and 16384); (3) distcomp.feature( 'LocalUseMpiexec', false );
Each worker created by parpool takes about 0.5GB (according to top), such that plenty of system memory is left. Java memory resources also seem not to be depleted.
Here is a test I ran:
%%parpool() test
distcomp.feature( 'LocalUseMpiexec', false )
JavaRuntimeSettings = java.lang.management.ManagementFactory.getRuntimeMXBean.getInputArguments
[~,freeSystemMemory]=system('vmstat -s -S M | grep "free memory"')
rJavaObj = java.lang.Runtime.getRuntime;
freeMemory = rJavaObj.freeMemory
totalMemory = rJavaObj.totalMemory
maxMemory = rJavaObj.maxMemory
for NumberOfWorkers = [50, 51]
tic
pool = parpool('local',NumberOfWorkers)
TimeElapsed = toc
[~,freeSystemMemory]=system('vmstat -s -S M | grep "free memory"')
rJavaObj = java.lang.Runtime.getRuntime;
freeMemory = rJavaObj.freeMemory
totalMemory = rJavaObj.totalMemory
maxMemory = rJavaObj.maxMemory
delete(pool)
end
And here is the output I get:
ans =
logical
0
JavaRuntimeSettings =
[-Xms64m, -XX:NewRatio=3, -Xmx2048m, -XX:MaxDirectMemorySize=2147400000, -XX:+AllowUserSignalHandlers, -Xrs, -XX:ThreadStackSize=16384, -Djava.library.path=/usr/local/MATLAB/R2018a/bin/glnxa64:/usr/local/MATLAB/R2018a/sys/jxbrowser/glnxa64/lib, vfprintf, -XX:ErrorFile=/home/mph/hs_error_pid38489.log, abort, -Duser.language=en, -Duser.country=US, -Dfile.encoding=UTF-8, -XX:ParallelGCThreads=6]
freeSystemMemory =
' 85393 M free memory
'
freeMemory =
313054528
totalMemory =
458752000
maxMemory =
1.9687e+09
Starting parallel pool (parpool) using the 'local' profile ...
connected to 50 workers.
pool =
Pool with properties:
Connected: true
NumWorkers: 50
Cluster: local
AttachedFiles: {}
AutoAddClientPath: true
IdleTimeout: 3 minutes (3 minutes remaining)
SpmdEnabled: true
TimeElapsed =
69.1710
freeSystemMemory =
' 65170 M free memory
'
freeMemory =
351541184
totalMemory =
448266240
maxMemory =
1.9687e+09
Parallel pool using the 'local' profile is shutting down.
Starting parallel pool (parpool) using the 'local' profile ...
connected to 51 workers.
At that point it stalls and I never get the prompt back. Using the top command in the linux terminal I can see plenty of idle Matlab workers.
When I terminate the process (Ctr+c) within Matlab I get the following:
Operation terminated by user during parallel.internal.queue.JavaBackedFuture/waitScalar (line 211)
In parallel.Future>@(o)waitScalar(o,predicate,waitGranularity,deadline)
In parallel.Future/wait (line 292)
ret = all(arrayfun(@(o) waitScalar(o, predicate, waitGranularity, deadline), ...
In parallel.Future/fetchOutputsImpl (line 574)
wait(F);
In parallel.Future/fetchOutputs (line 341)
varargout = fetchOutputsImpl(F(:), nargout, varargin{:});
In parallel.Pool>iPostLaunchSetup (line 674)
mapping = fetchOutputs(parfevalOnAll(pool, @iGetMachineToWorkerMappingAndUnfreezePaths, 1, ...
In parallel.Pool.hBuildPool (line 588)
iPostLaunchSetup(aPool, client.ParallelJob.AdditionalPaths);
In parallel.internal.pool.doParpool (line 18)
pool = parallel.Pool.hBuildPool(constructorArgs{:});
In parpool (line 98)
pool = parallel.internal.pool.doParpool(varargin{:});
In partictoc (line 12)
pool = parpool('local',NumberOfWorkers)
So, what are these workers waiting for and why? How to make them do work?
0 个评论
回答(1 个)
Sangeetha Jayaprakash
2018-5-21
Hi,
If you are referring to Xeon Phi host processors (as introduced with the Knights Landing architecture), they are compatible with the Parallel Computing Toolbox, as any other x86_64 processor with multiple cores. If you would like to use Xeon Phi coprocessors, they are not currently supported.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!