parpool sometimes errors in compiled application

23 次查看(过去 30 天)
I am compiling an application that uses parallel computing features, and am getting intermittent errors when creating a parallel pool (and also when creating a job using "batch", which is probably the same issue).
This started occurring when I moved from 2020B to 2024B - I did not have any issues when running under 2020B.
The cluster is just the local machine (not a remote cluster).
The same behavior occurs on a machine that does not have MATLAB installed (just the MCR).
Specs:
Windows 10 Pro
2x Intel Xeon E5-2697A (32 cores total)
MATLAB 2024B Update 5
Here's some code that can be compiled to show the issue:
function fnToCompile(nAttempts, nCoresToUse, nThreads)
arguments
nAttempts = 10
nCoresToUse = inf
nThreads = 1
end
if ischar(nAttempts)
nAttempts = str2double(nAttempts);
end
if ischar(nCoresToUse)
nCoresToUse = str2double(nCoresToUse);
end
if ischar(nThreads)
nThreads = str2double(nThreads);
end
nCores = feature('numcores');
nCoresToUse = min(nCores,nCoresToUse);
nCoresToUse = max(nCoresToUse,1);
disp(['Parallel pool will use ' num2str(nCoresToUse) ' cores out of ' num2str(nCores) ' available.']);
feature('numcores');
delete(gcp('nocreate'));
c = parcluster('Processes');
c.NumWorkers = nCores;
c.NumThreads = nThreads;
jobs = findJob(c);
delete(jobs);
disp(c);
nFailures = 0;
for iAttempt = 1:nAttempts
disp('---------------------');
disp(datetime('now'));
disp(['Attempt ' num2str(iAttempt) ' of ' num2str(nAttempts) ' - ' num2str(nFailures) ' failures so far.']);
delete(gcp('nocreate'));
try
jobs = findJob(c);
delete(jobs);
catch ME
disp('Error deleting jobs.');
disp(ME.message);
disp(ME.identifier);
for iStack = 1:length(ME.stack)
str = sprintf('%25s file: %s, name: %s, line: %u','',ME.stack(iStack).file,ME.stack(iStack).name,ME.stack(iStack).line);
fprintf('%s\n',str);
end
end
try
%Try to make a parallel pool
pool = parpool(c,nCoresToUse);
catch ME
%Problem creating the parallel pool
disp('Parallel pool error.');
disp(ME.message);
disp(ME.identifier);
for iStack = 1:length(ME.stack)
str = sprintf('%25s file: %s, name: %s, line: %u','',ME.stack(iStack).file,ME.stack(iStack).name,ME.stack(iStack).line);
fprintf('%s\n',str);
end
nFailures = nFailures + 1;
end
disp('---------------------');
end
disp([num2str(nFailures) ' failures out of ' num2str(nAttempts)]);
end
And here's what I get when I run this compiled from a command window (username and some paths in this output have been modified):
C:\path\to\fnToCompile\for_redistribution_files_only>fnToCompile.exe 10 32 2
Parallel pool will use 32 cores out of 32 available.
MATLAB detected: 32 physical cores.
MATLAB detected: 64 logical cores.
MATLAB was assigned: 64 logical cores by the OS.
MATLAB is using: 32 logical cores.
MATLAB is not using all logical cores because hyper-threading is enabled.
Local Cluster
Properties:
Profile: Processes
Modified: true
Host: computername
NumWorkers: 32
NumThreads: 2
JobStorageLocation: C:\Users\username\AppData\Local\MathWorks\MatlabRuntimeCache\R2024b\fnToCo9\local_cluster_jobs\R2024b
RequiresOnlineLicensing: false
PreferredPoolNumWorkers: Inf
Associated Jobs:
Number Pending: 0
Number Queued: 0
Number Running: 0
Number Finished: 0
---------------------
19-Mar-2025 10:22:08
Attempt 1 of 10 - 0 failures so far.
Starting parallel pool (parpool) using the 'Processes' profile ...
Parallel pool using the 'Processes' profile is shutting down.
Parallel pool error.
Parallel pool failed to start with the following error.
parallel:cluster:PoolCreateFailed
file: C:\Program Files\MATLAB\R2024b\mcr\toolbox\parallel\cluster\+parallel\@Cluster\parpool.m, name: parpool, line: 79
file: C:\Users\username\AppData\Local\MathWorks\MatlabRuntimeCache\R2024b\fnToCo9\fnToCompile\fnToCompile.m, name: fnToCompile, line: 60
---------------------
---------------------
19-Mar-2025 10:22:19
Attempt 2 of 10 - 1 failures so far.
Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 32 workers.
---------------------
---------------------
19-Mar-2025 10:22:45
Attempt 3 of 10 - 1 failures so far.
Parallel pool using the 'Processes' profile is shutting down.
Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 32 workers.
---------------------
---------------------
19-Mar-2025 10:23:17
Attempt 4 of 10 - 1 failures so far.
Parallel pool using the 'Processes' profile is shutting down.
Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 32 workers.
---------------------
---------------------
19-Mar-2025 10:23:49
Attempt 5 of 10 - 1 failures so far.
Parallel pool using the 'Processes' profile is shutting down.
Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 32 workers.
---------------------
---------------------
19-Mar-2025 10:24:20
Attempt 6 of 10 - 1 failures so far.
Parallel pool using the 'Processes' profile is shutting down.
Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 32 workers.
---------------------
---------------------
19-Mar-2025 10:24:52
Attempt 7 of 10 - 1 failures so far.
Parallel pool using the 'Processes' profile is shutting down.
Starting parallel pool (parpool) using the 'Processes' profile ...
Parallel pool using the 'Processes' profile is shutting down.
Parallel pool error.
Parallel pool failed to start with the following error.
parallel:cluster:PoolCreateFailed
file: C:\Program Files\MATLAB\R2024b\mcr\toolbox\parallel\cluster\+parallel\@Cluster\parpool.m, name: parpool, line: 79
file: C:\Users\username\AppData\Local\MathWorks\MatlabRuntimeCache\R2024b\fnToCo9\fnToCompile\fnToCompile.m, name: fnToCompile, line: 60
---------------------
---------------------
19-Mar-2025 10:25:09
Attempt 8 of 10 - 2 failures so far.
Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 32 workers.
---------------------
---------------------
19-Mar-2025 10:25:35
Attempt 9 of 10 - 2 failures so far.
Parallel pool using the 'Processes' profile is shutting down.
Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 32 workers.
---------------------
---------------------
19-Mar-2025 10:26:07
Attempt 10 of 10 - 2 failures so far.
Parallel pool using the 'Processes' profile is shutting down.
Starting parallel pool (parpool) using the 'Processes' profile ...
Parallel pool using the 'Processes' profile is shutting down.
Parallel pool error.
Parallel pool failed to start with the following error.
parallel:cluster:PoolCreateFailed
file: C:\Program Files\MATLAB\R2024b\mcr\toolbox\parallel\cluster\+parallel\@Cluster\parpool.m, name: parpool, line: 79
file: C:\Users\username\AppData\Local\MathWorks\MatlabRuntimeCache\R2024b\fnToCo9\fnToCompile\fnToCompile.m, name: fnToCompile, line: 60
---------------------
3 failures out of 10
The failures occur more frequently when I request a large number of workers in the pool - the fewer cores requested, the more likely the pool will open succesfully. I get crashes if I run with one thread per worker, or two.
I never get errors if I run the function interactively in the MATLAB command window.
Sometimes the parallel pool starts successfully, and sometimes it fails, so it doesn't feel like a configuration issue like a bad path or missing files for the compiler - I think that would cause errors every time.
Has anyone experienced this? Any solution besides a workound of "try/catch it until it works"?
Thanks,
Michael

回答(2 个)

Kautuk Raj
Kautuk Raj 2025-3-25
I see that you are sporadically facing the error "parallel:cluster:PoolCreateFailed" while running a parallel pool in MATLAB R2024b.
I would suspect the setting for "ulimit -u" (user processes) will be set to something low, like 1024. We need to raise this limit to start additional workers successfully. You can uset the following command to do this:
ulimit -u NewValue
Some recommended values can be found in the MathWorks documentation here: https://www.mathworks.com/help/parallel-computing/recommended-system-limits-for-macintosh-and-linux.html
  2 个评论
Michael
Michael 2025-3-25
Hi Kautuk, this is on a Windows machine - I do not see an equivalent setting for Windows (only *nix based operating systems).
Caitlin
Caitlin 2025-4-10
I'm seeing this same issue on a Linux machine, and changing ulimit doesn't help.
I'm also using a compiled matlab application - it worked fine on 2020b and now I'm having issues with creating a parallel pool using R2024b Update 2, the only error message being parallel:cluster:PoolCreateFailed.

请先登录,再进行评论。


Steven Lord
Steven Lord 2025-4-11
I'm not certain but I'm wondering if this could be a different manifestation of Bug Report 3324846. Are you using release R2024b Update 4 or later? If not, could you try updating to that Update or later and see if the problem persists?
  4 个评论
Michael
Michael 2025-4-11
Caitlin, were you getting intermittent errors before? Or was it erroring every time?
Caitlin
Caitlin 2025-4-11
编辑:Caitlin 2025-4-11
It was erroring every time for me, but I'm not on Windows. I was getting the exact same error message as you, which is why I hopped on this thread.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by