HPC MATLAB parpool and speed

Question

RUAN YY 2020-9-25

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/599929-hpc-matlab-parpool-and-speed

回答： RUAN YY 2020-9-25

Hey guys! I am new to the HPCC. And I am now running my MATLAB program on it. I am using parellel computing, i.e. parpool

Here is the code for my "submit.sh"

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
/opt/hpc/MATLAB/R2019b/bin/matlab -nojvm -nodesktop -r "main_MultiEA;exit;"

The first thing is that I found the speed is similar to my local computer. Should I specify something in the .sh file to change this? And how can I know whether I reach the limit of the resource or not?

The second thing is that I found that the only available parpool is "local", using the "allNames = parallel.clusterProfiles()" command. Should it be different on the HPCC?

The third thing is that when I use "parpool(16)" or "parpool('local',16)" or "parpool("myPool",16)" etc.. to try to improve the speed, it the program seems to crash. Here is my test.m to test the parpool. And I guess the program crashes as there is no a.mat in the directory.

parpool("local",16);
a=0;
parfor i = 1:10
        a = a+1;
end
save a.mat;
exit;

Would you tell me why's that? And how can I improve the speed? Thanks a lot!!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Raymond Norris 2020-9-25

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/599929-hpc-matlab-parpool-and-speed#answer_500569

在 MATLAB Online 中打开

Hi Ruan,

There are two ways to speed up your code, implicitly and explicitly. You don't have much control over implicitly. MATLAB will find the best ways to use your multi-cores. Explicitly, you can vectorize, pre-allocate, MEX-files, etc. You can also use parallel pools.

Looking at your Slurm job script, make the following change:

/opt/hpc/MATLAB/R2019b/bin/matlab -nojvm -nodesktop -r "main_MultiEA;exit;"

to

/opt/hpc/MATLAB/R2019b/bin/matlab -batch main_MultiEA

-batch works instead of -nodesktop, -r, "exit". And you'll need the JVM if you use PCT.

I'd also consider using module if you have it (your module name -- matlab -- might be slightly different)

module load matlab
matlab -batch main_MultiEA

Next, you're requesting from Slurm 2 nodes, with 2 cores per node (total of 4 cores). But MATLAB only runs on a single node, so the 2nd node is of no use. That means when you start the pool of 16 workers, you're running it on 2 cores (or you should be -- might depend if you have cgroups). This is probably why MATLAB is crashing -- you're running out of memory. To write this more flexibly, try

sz = getenv('SLURM_CPUS_PER_TASK');
parpool("local",sz);
a=0;
parfor i = 1:10
    a = a+1;
end
save a.mat

This way, regardless of the cores per node you request, you'll get the right size.

With that said, there are two things to think about

obviously, you'll see no speed up in your example. There has to be a reasonable amount of work to do.
using the "local" profile, the parallel pool will only run "local" to wherever MATLAB is running (on the HPC compute node). If you want to run a larger pool, across nodes, then you'll need to create a Slurm profile with MATLAB Parallel Server.

Raymond

3 个评论
显示 1更早的评论隐藏 1更早的评论

Raymond Norris 2020-9-25

在 MATLAB Online 中打开

test.m is calling save at the end, so when you call test, either via CLI or Slurm, you're going to generate a.mat. Do you not want the MAT-file to be generated? If not, simply comment out the line at the bottom of the file.

If this doesn't work

sz = getenv('SLURM_CPUS_PER_TASK');

then you might try

sz = getenv('SLURM_JOB_CPUS_PER_NODE');

What Slurm output/error file is being generated? If you're Slurm jobscript is only specifying the name of the job (Group3), it's possible

You're not requesting enough cores (16 or 17). Add #SBATCH -n 16
You're not requesting enough memory. Add #SBATCH --mem-per-cpu=2048

For instance:

#SBATCH -J Group3
#SBATCH -n 16                 # Request 16 cores
#SBATCH --mem-per-cpu=2048    # Request 2 GB/core
/opt/hpc/MATLAB/R2019b/bin/matlab -batch test

Otherwise, please paste in the crash.

RUAN YY 2020-9-25

Thank you very much! Let me try!

I want to use the a.mat file to see whether the program crashes or not. That's why I added that "dummy" statement.

请先登录，再进行评论。

Answer 2

RUAN YY 2020-9-25

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/599929-hpc-matlab-parpool-and-speed#answer_500629

在 MATLAB Online 中打开

I know why there is no .mat file output now.

[Warning: Objects of class 'parallel.cluster.Local' cannot be saved to MATfiles.] 

I should check the slurm-JobID.out file, eg. slurm-21127.out

The print or warning or anything that is supposed to be output to your command line in your normal GUI will be stored int he slurm...file.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

HPC MATLAB parpool and speed

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

3 个评论
显示 1更早的评论隐藏 1更早的评论

更多回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

HPC MATLAB parpool and speed

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

3 个评论 显示 1更早的评论隐藏 1更早的评论

更多回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论