Matlab2021b constantly fails to invoke parpool
9 次查看(过去 30 天)
显示 更早的评论
Hi,
I have a problem in Matlab 2021b while invoking parpool.
I have a script that in several parts of it (4 to be exact), I make use of parpool.
I invoke parpool using the following snipet:
test_p = gcp('nocreate')';
if isempty(test_p)
myPool = parpool('local',64);
end
While the first 3 parpools are opening without a problem, the 4th time the parpool crashes with the following error:
Error using parpool (line 146)
Parallel pool failed to start with the following error. For more detailed information, validate the profile 'local' in the Cluster Profile Manager.
Caused by:
Error using parallel.internal.pool.AbstractInteractiveClient>iThrowWithCause (line 305)
Failed to initialize the interactive session.
Error using parallel.internal.pool.AbstractInteractiveClient>iThrowIfBadParallelJobStatus (line 399)
The interactive communicating job failed with no message.
This unstable beahviour has happened multiple times and not only with this script.
Sometimes the parpool will open, some others it will crash.
To solve this, I always re-start Matlab and delete the ~/.matlab/local_cluster_jobs, but this is only a temporal remedy. The issue persists.
Running Validate in the Cluster Profile Manager, failed on the invocation of parpool, producing the following report:
Start Time: Fri Apr 29 01:19:58 EDT 2022
Finish Time: Fri Apr 29 01:20:17 EDT 2022
Running Duration: 0 min 19 sec
Description: Job ran with 64 workers.
Error Report:
Command Line Output:
Debug Log:
Stage: Pool job test (createCommunicatingJob)
Status: Passed
Start Time: Fri Apr 29 01:20:17 EDT 2022
Finish Time: Fri Apr 29 01:20:36 EDT 2022
Running Duration: 0 min 19 sec
Description: Job ran with 64 workers.
Error Report:
Command Line Output:
Debug Log:
Stage: Parallel pool test (parpool)
Status: Failed
Start Time: Fri Apr 29 01:20:36 EDT 2022
Finish Time: Fri Apr 29 01:24:10 EDT 2022
Running Duration: 3 min 34 sec
Description: Failed to initialize the interactive session.
Error Report: Failed to initialize the interactive session.
Caused by:
Error using parallel.internal.pool.AbstractInteractiveClient>iThrowIfBadParallelJobStatus (line 399)
The interactive communicating job failed with no message.
Command Line Output:
Debug Log: CLIENT LOG OUTPUT
Currently connected to: 1
Checking communicating job status.
Session failed to start when creating InteractiveClient. Error: Error using parallel.internal.pool.AbstractInteractiveClient>iThrowWithCause (line 305)
Failed to initialize the interactive session.
Error in parallel.internal.pool.AbstractInteractiveClient/start (line 142)
iThrowWithCause( 'parallel:convenience:FailedToInitializeInteractiveSession', err );
Error in parallel.internal.pool.AbstractClusterPool>iStartClient (line 831)
spmdInitialized = client.start(sessionBuildFcn, sessionInfo, numWorkers, cluster, ...
Error in parallel.internal.pool.AbstractClusterPool.hBuildPool (line 585)
iStartClient(client, sessionInfo, forceSpmdEnabled, cluster, supportRestart, argsList);
Error in parallel.internal.types.ValidationStages>iOpenPoolForCluster (line 456)
aPool = parallel.internal.pool.AbstractClusterPool.hBuildPool('Cluster', cluster, 'NumWorkers', numWorkers);
Error in parallel.internal.types.ValidationStages>@()iOpenPoolForCluster(runInfo)
Error in parallel.internal.types.ValidationStages>iCallWithNoHotlinks (line 336)
[varargout{1:nargout}] = fcn();
Error in parallel.internal.types.ValidationStages>iRunParpoolStage (line 247)
[commandWindowOutput, aPool] = evalc(iWrapForEvalc(openPoolFcn));
Error in parallel.internal.types.ValidationStages/run (line 68)
[eventData, runInfo] = obj.RunFunction(obj, runInfo);
Error in parallel.internal.validator.Validator/runValidationSuite (line 191)
[eventData, stageRunInfo] = currentStage.run(stageRunInfo);
Error in parallel.internal.validator.Validator/validate (line 103)
status = obj.runValidationSuite(profileName, suite);
Error in parallel.internal.ui.AbstractValidationManager/validate (line 36)
obj.Validator.validate(profileName, validationSuite);
Error in parallel.internal.ui.ValidationManager.validateProfile (line 36)
parallel.internal.ui.ValidationManager.getOrCreateInstance().validate(profileName, suite);
Caused by:
Error using parallel.internal.pool.AbstractInteractiveClient>iThrowIfBadParallelJobStatus (line 399)
The interactive communicating job failed with no message.
Failed to run the DisarmableOncleanup callback due to the following error:
Dot indexing is not supported for variables of this type.
What exactly is the problem here?
I am running Matlab on a Centos 7 machine with two "Intel(R) Xeon(R) Platinum 8352Y CPU @ 2.20GHz" (total of 64 physical - 128 logical cores) and 1.5TB of RAM.
I would really appreciate your help here as this is severely impacting my work.
Thank you in advance for your help and time!
7 个评论
Lin
2022-4-29
Hi, I have exactly the same problem with parpool in R2021b but on a CentOS 8 machine. No iptables/nftables is used. My script sometimes works but sometimes doesn't. It would be great if anyone could help to solve the problem. Thank you.
回答(1 个)
Yash
2024-1-17
Hi,
When operating on Windows with MATLAB R2021b, users with non-ASCII characters in their usernames, such as extended ASCII characters, encounter difficulties with the local cluster's functionality. Specifically, starting parallel pools or running independent jobs using commands like parpool('local') leads to vague failure messages, such as "Failed to initialize the interactive session". This issue has been identified in the External Bug Report here: https://www.mathworks.com/support/bugreports/details/2619526
This issue was fixed in 2021b Update 3 and 2022a, further they have also provided a workaround in the bug report that you can try as a fix.
Hope this helps!
5 个评论
Yash
2024-2-8
编辑:Walter Roberson
2024-2-8
In the workaround, it is mentioned to use the "-c" startup flag to override the default license path of MATLAB to one that contains only ASCII characters. They have mentioned the steps for Windows. But at the end of EBR they have given this link: https://uk.mathworks.com/matlabcentral/answers/102520-how-do-i-change-the-license-search-location-for-matlab
This has the steps for Windows, MacOS and Linux for the same workaround.
Walter Roberson
2024-2-8
The workaround provided in the bug report is very OS specific. It is mostly accidental that it happens to mention a link that can be used for Linux.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!