Unable to start parallel pool for more than 12 cores

9 次查看(过去 30 天)
Hi
My Matlab version is 2019a and my server has 8 cpus(Intel(R) Xeon(R) CPU E7- 8860 @ 2.27GH), each cpu has 10 cores with hyperthreading. Hence I thought I can at most set my "preferred number of workers in a parallel pool" to be 80. However, whenever I set my "preferred number of workers in a parallel pool" to be higher than 12, Matlab returns "failed to start parallel pool" to me. This is my cluster profile:
Thanks

回答(1 个)

Raymond Norris
Raymond Norris 2020-12-19
I'm a bit confused how setting the default size of a parallel pool would throw "failed to start parallel pool", since setting the size in the profile doesn't start a pool. I'm gathering that your Intel E7-8860 has 8 CPUs with 10 cores/socket plus hypertheading (that is, the 10 cores don't reflect the HT). Where are you running your MATLAB client, on your local workstation or on one of the server nodes?
Although you can run a local pool on a single node on the server, I'm wondering if you're running MATLAB on your local workstation, where there are less cores. Run the following in MATLAB on the workstation where you're setting the profile.
feature numcores
The local profile provides the settings for a local pool on the machine where the MATLAB client is running. If you want to run the pool of workers on your 80 core/node server, you either need to run MATLAB directly on the server (and use the 'local' profile) or create a new a new profile in your workstation MATLAB. This new profile would instruct MATLAB how to submit to scheduler (e.g. MJS, Slurm, etc.) on the cluster.
If this sounds about right, contact Technical Support (support@mathworks.com) -- they can walk you through the process of submitting parallel jobs on machines other than your local workstation.
  2 个评论
Xiaofan Cui
Xiaofan Cui 2020-12-19
编辑:Xiaofan Cui 2020-12-19
Thank you so much for your quick reply, Raymond. The problem occured when I am running my code. The "parfor" in my code triggered the parallel pool to start. Then the matlab keeps trying to start the parallel pool (some times can be 1 hour long), and then fails and return me this error.
I guess I am using MATLAB on a server node.
Raymond Norris
Raymond Norris 2021-2-13
If you're running MATLAB on a server node, how many cores did you allocate to it? That is, I'm going to assume you're running under some scheduled environment (e.g. PBS) and if so, can you post your job script? It's possible that you only request 1-2 cores, but the local profile sees 80 and it's contending with other jobs running on the same node.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Third-Party Cluster Configuration 的更多信息

产品


版本

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by