- obviously, you'll see no speed up in your example. There has to be a reasonable amount of work to do.
- using the "local" profile, the parallel pool will only run "local" to wherever MATLAB is running (on the HPC compute node). If you want to run a larger pool, across nodes, then you'll need to create a Slurm profile with MATLAB Parallel Server.
HPC MATLAB parpool and speed
11 次查看(过去 30 天)
显示 更早的评论
Hey guys! I am new to the HPCC. And I am now running my MATLAB program on it. I am using parellel computing, i.e. parpool
Here is the code for my "submit.sh"
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
/opt/hpc/MATLAB/R2019b/bin/matlab -nojvm -nodesktop -r "main_MultiEA;exit;"
The first thing is that I found the speed is similar to my local computer. Should I specify something in the .sh file to change this? And how can I know whether I reach the limit of the resource or not?
The second thing is that I found that the only available parpool is "local", using the "allNames = parallel.clusterProfiles()" command. Should it be different on the HPCC?
The third thing is that when I use "parpool(16)" or "parpool('local',16)" or "parpool("myPool",16)" etc.. to try to improve the speed, it the program seems to crash. Here is my test.m to test the parpool. And I guess the program crashes as there is no a.mat in the directory.
parpool("local",16);
a=0;
parfor i = 1:10
a = a+1;
end
save a.mat;
exit;
Would you tell me why's that? And how can I improve the speed? Thanks a lot!!
0 个评论
采纳的回答
Raymond Norris
2020-9-25
Hi Ruan,
There are two ways to speed up your code, implicitly and explicitly. You don't have much control over implicitly. MATLAB will find the best ways to use your multi-cores. Explicitly, you can vectorize, pre-allocate, MEX-files, etc. You can also use parallel pools.
Looking at your Slurm job script, make the following change:
/opt/hpc/MATLAB/R2019b/bin/matlab -nojvm -nodesktop -r "main_MultiEA;exit;"
to
/opt/hpc/MATLAB/R2019b/bin/matlab -batch main_MultiEA
-batch works instead of -nodesktop, -r, "exit". And you'll need the JVM if you use PCT.
I'd also consider using module if you have it (your module name -- matlab -- might be slightly different)
module load matlab
matlab -batch main_MultiEA
Next, you're requesting from Slurm 2 nodes, with 2 cores per node (total of 4 cores). But MATLAB only runs on a single node, so the 2nd node is of no use. That means when you start the pool of 16 workers, you're running it on 2 cores (or you should be -- might depend if you have cgroups). This is probably why MATLAB is crashing -- you're running out of memory. To write this more flexibly, try
sz = getenv('SLURM_CPUS_PER_TASK');
parpool("local",sz);
a=0;
parfor i = 1:10
a = a+1;
end
save a.mat
This way, regardless of the cores per node you request, you'll get the right size.
With that said, there are two things to think about
Raymond
3 个评论
Raymond Norris
2020-9-25
test.m is calling save at the end, so when you call test, either via CLI or Slurm, you're going to generate a.mat. Do you not want the MAT-file to be generated? If not, simply comment out the line at the bottom of the file.
If this doesn't work
sz = getenv('SLURM_CPUS_PER_TASK');
then you might try
sz = getenv('SLURM_JOB_CPUS_PER_NODE');
What Slurm output/error file is being generated? If you're Slurm jobscript is only specifying the name of the job (Group3), it's possible
- You're not requesting enough cores (16 or 17). Add #SBATCH -n 16
- You're not requesting enough memory. Add #SBATCH --mem-per-cpu=2048
For instance:
#SBATCH -J Group3
#SBATCH -n 16 # Request 16 cores
#SBATCH --mem-per-cpu=2048 # Request 2 GB/core
/opt/hpc/MATLAB/R2019b/bin/matlab -batch test
Otherwise, please paste in the crash.
更多回答(1 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Third-Party Cluster Configuration 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!