How to shut down all running workers of paarpools?

26 次查看(过去 30 天)
How can I find and shut down all workers of all parpools that might currently be running?
During debugging I frequently run into crashes and out of memory errors. Often, some worker processes keep running and I would like to know, how to best close all of them, before starting another script.

回答(3 个)

Raymond Norris
Raymond Norris 2023-3-6
Hi @Felix. If even if a single worker crashes, all workers will terminate. Can you elaborate a bit more on a couple of things
  1. Are you using a local pool or a cluster? If cluster, MJS or your own scheduler (and if so, which)?
  2. Which parallel constructs are you using (parfor, parfeval, etc.)? Can you give a simple example of what might crash. Not interested in the details (I'm sure the worker(s) are crashing), more interested in how your running the code.
  1 个评论
Edric Ellis
Edric Ellis 2023-3-7
Note that on "local" and MJS clusters, the parallel pool will not necessarily immediately terminate when a single worker crashes. On those clusters, pools that have not yet used spmd can survive losing workers.

请先登录,再进行评论。


Edric Ellis
Edric Ellis 2023-3-7
You can shut down all remaining workers of the currently running pool by executing:
delete(gcp('nocreate'))
There should be no running workers other than in the current pool.
  1 个评论
Davy Figaro
Davy Figaro 2024-5-16
This shuts down the current parallel pool (created with parpool). How can I stop and clear all the workers without shutting down the pool?

请先登录,再进行评论。


Felix
Felix 2023-3-8
  1. I'm using local pools on my machine with default settings. On my machine this defaults to 12 workers.
  2. So far, I'm using parfor and the run command with MultiStart problems. I'll sometimes start a pool before running a script via parpool to reduce runtime of that script.
A simple, somewhat pseudocode example of my monte carlo stuff might be:
relevant_input = randn(1000, 1);
relevant_output = nan(height(relevant_input), 1);
param = 10;
parpool;
my_fun = @(input) elaborate_function(par, relevant_input);
parfor h=1:height(relevant_input)
relevant_ouput(h,1) = my_fun(input);
end
function y = elaborate_function(par, x)
y = param*x.*sin(x);
end
Another use case is the MultiStart object with
ms = MultiStart('UseParallel', true, 'Display','iter');
, which I use with run.
My scripts sometimes crash and I have trouble restarting them, because some workers do not seem to clear their memory when they crash. When I try to restart I get warnings such as:
Starting parallel pool (parpool) using the 'Processes' profile ...
Preserving jobs with IDs: 10 12 13 because they contain crash dump files.
You can use 'delete(myCluster.Jobs)' to remove all jobs created with profile Processes. To create 'myCluster' use 'myCluster = parcluster('Processes')'.
However, these crash dump files and the preserved jobs hog up way too much memory on my machine. I am looking for a couple lines of code to put at the start of my scripts that search running jobs, such as the ones containing crash dump files and terminate them if they exist, so I don't have to type delete(myCluster.Jobs) every time myself.
  1 个评论
Raymond Norris
Raymond Norris 2023-3-14
I'm confused how the crash dump files and preserverd jobs how up too much memory. Do you mean disk space?
If a job is running, I'm not sure there would be a crash dump file (untill the end). And do you want to delete the crash file or the job? If you're running a parallel pool and the pool crashes, there's no job to delete.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息

产品


版本

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by