parfor loop errors on AMD cores limits
显示 更早的评论
Hello,
I am trying to run a simple parfor script on nodes on our cluster. The code works fine until I try to use > 46 CPUs (workers) at once, on one server. Some of our latest nodes have 128 AMD cores. I can run up to 56 cores on our Intel CPU servers (nodes) , but on any AMD I get errors (java runtime and others) when using >46 cores. It would be great to use all 128 cores on these new nodes for our MATLAB code. I have tried increasing memory and I still get these errors when using > 46 cores.
I will attach the MATLAB crash dump, code and sbatch files.
My sbatch file (I have tried many, many different parameters) -
#!/bin/bash
#SBATCH -J pfor_matlab
#SBATCH -o pfor".%j".out
#SBATCH -e pfor".%j".err
#SBATCH -t 45:00
#SBATCH -N 1
#SBATCH -p normal
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=48
module load matlab
hostname -s
env | egrep SLURM
matlab -nosplash -nodesktop -r "pfor"
The sbatch produces this output in the SLURM .err file-
Error using parpool (line 145)
Parallel pool failed to start with the following error. For more detailed
information, validate the profile 'local' in the Cluster Profile Manager.
Error in pfor (line 5)
parpool('local', str2num(getenv('SLURM_CPUS_PER_TASK')))
Caused by:
Error using parallel.internal.pool.InteractiveClient>iThrowWithCause (line
670)
Failed to initialize the interactive session.
Error using
parallel.internal.pool.InteractiveClient>iThrowIfBadParallelJobStatus
(line 781)
The interactive communicating job failed with no message
Thank you for any pointers!
Mark
2 个评论
Walter Roberson
2021-2-9
The volunteers are not likely to know the solution for this; you should open a support case.
Mark PIERCY
2021-2-9
采纳的回答
更多回答(0 个)
类别
在 帮助中心 和 File Exchange 中查找有关 Third-Party Cluster Configuration 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!