Is there a way to launch safely multiple instances of Matlab Compiler codes at the same time?

4 次查看(过去 30 天)
Hi,
I use Matlab compiled functions with the Matlab Compiler on high performance computing servers. I’m having some issues when I launch simultaneously several jobs of my compiled code that use parallelization. For instance, yesterday, I just launched 12 jobs of 4 workers each at the same time and 3 of them failed. The error messages for these jobs are:
Job #1
Failed to locate and destroy old interactive jobs.
Error using parallel.Job/delete (line 1295)
The job storage metadata file '/home/username/.mcrCache9.5/main_S0/local_cluster_jobs/R2018b/matlab_metadata.mat' does not exist or is corrupt. For assistance recovering job data, contact MathWorks Support Team. Otherwise, delete all files in the JobStorageLocation and try again.
Job #11
Failed to start pool.
Error using save
Unable to write file /home/username/.mcrCache9.5/main_S0/local_cluster_jobs/R2018b/Job12.in.mat: No such file or directory.
Job #12
Failed to start pool.
Error using parallel.Cluster/createConcurrentJob (line 1136)
Can not write file /home/username/.mcrCache9.5/main_S0/local_cluster_jobs/R2018b/Job12.in.mat.
I’m guessing that when Matlab or the runtime try to create the parallel pool it reads/writes/removes temporary configuration files that are common for every task which cause conflicts.
Is there a way either for me or my server administrator to fix that? It happens quite often and it is just annoying to have to restart each failed jobs every time it happens.
Note that I tried to fix it by adding a 45s delay between each task, but even with that, I’m still having this issue. Also, the job scheduler the server uses is Slurm.

回答(0 个)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by