Why do I receive an error that no supported GPU device was found when submitting a job to a MATLAB Parallel Server cluster using Slurm?

3 次查看(过去 30 天)
Why do I receive an error that no supported GPU device was found when submitting a job to a MATLAB Parallel Server cluster using Slurm?
Unable to find a supported GPU device.

采纳的回答

MathWorks Support Team
MathWorks Support Team 2024-4-10,0:00
编辑:MathWorks Support Team 2024-4-10,19:10
This error may occur if...
  • MATLAB Parallel Server cannot detect the node's GPU
  • GPUsPerNode has not been added to the integration scripts
  • The GPU is not being requested in the cluster profile correctly
  • Slurm's configuration has not made any GPUs available
To tell if MATLAB Parallel Server can detect a GPU, run this command on the worker node in question:
Linux
matlab -dmlworker -r "gpuDevice"
Windows
matlab -dmlworker -batch "gpuDevice"
Please use the latest integration scripts with your cluster profile. When using the integration scripts, you will need to add this to the file getCommonSubmitArgs.m:
% GPU
ngpus = validatedPropValue(ap, 'GPUsPerNode', 'double', 0);
if ngpus>0
gcard = validatedPropValue(ap, 'GPUCard', 'char', '');
commonSubmitArgs = sprintf('%s --gres=gpu:%s:%d', commonSubmitArgs, gcard, ngpus);
commonSubmitArgs = strrep(commonSubmitArgs,'::',':');
end
You can then use the AdditionalProperty "GPUsPerNode" in your cluster profile to specify GPUs per node. Otherwise, you'll need to add "--gres=gpu:%s:%d" to your AdditionalSubmitArgs. One of these methods should be used to request GPUs per node.
If none of these things work, please make sure that GPUs have been added to the Slurm and gres configuration files.

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息

标签

尚未输入任何标签。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by