GPU Device is not recognised in Matlab Deep-Learning offical docker image

4 次查看(过去 30 天)
Hello,
I am trying to used Matlab-deep-learning (mathworks/matlab-deep-learning Tags | Docker Hub R2023b) docker image on our HPC server (slurm -based).
I am using srun utility to run the docker image:
srun \
--time=0-02:00:00 --gpus-per-node=1 --container-image=mathworks/matlab-deep-learning:r2023b \
--container-name=matlabDeepLearningGPU --pty bash
When launching the image, nvidia-smi returns the following, showing the CUDA version to be N/A.
When I ran matlab and execute gpuDevice(), I get the following error:
I am wondering if this is an issue with the docker image provided by matlab, or is it related to the drivers installed on the host or maybe something else...?
I am getting the same error where i use NVIDIA GeForce RTX 3090 , NVIDIA H100 NVL , RTX 2080Ti....
Thank you!

回答(1 个)

Michael
Michael 2024-9-9
Hi @Mahdi,
Thanks for reaching out about this.
This error looks like the one seen when the container has not been started using the Nvidia container runtime correctly.
To do this using docker you need to install the nvidia-container-toolkit and then ensure that both the driver is installed and the GPUs passed into the container runtime. For Docker this is done by passing --gpus all when running the container and for Singularity this done by passing -nv.
You should be able to test these if you have interactive access to any machines where these GPUs are available, but for more information in your case you may need to speak to the system administrators of the HPC you are using to determine if the correct flags are being passed when the containers are run.
Hope that is helpful,
Michael
  1 个评论
Mahdi
Mahdi 2024-9-13
编辑:Mahdi 2024-9-13
Hello,
Thank you for the reply. In fact, I couldn't add the -nv option (neither --nv) to my srun command, and i couldn't use it by calling srun .... exec docker run .... Maybe it is a limitation by the adminstraition or by how slurm is configured...
I ended up using the image from : nvidia/cuda:12.2.2-cudnn8-devel-ubuntu22.04 and building a docker image containing matlab, based on it using matlab installer dockerfile provided by mathworks. Using this, I managed to access nvcc and nvidia-smi inside the container image and the GPUs where passed to matlab successfully. This worked without adding other options to the srun command above (except adapting the container name and image options to adapt to this new build). Hence it seems that slurm is passing the gpus and the drivers are installed on the HPC machine, given the image worked. Maybe it is an issue of compatibility of our slrum and the mathworks/matlab-deep-learning image?

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Containers 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by