System call fails from MATLAB worker

7 次查看(过去 30 天)
Kevin
Kevin 2018-2-9
评论: Kevin 2018-2-12
I am encountering some difficulty with MATLAB workers in R2017a.
It works perfectly unless the MATLAB code attempts to make a system call. Then it fails with "Unexpected system error: bang: poll [4] Interrupted system call".
For example, if I submit the following:
>> job = sge_cluster.createJob();
>> job.createTask(@() system(''), 0);
>> job.submit();
Then the job runs but produces the following error:
>> disp(job.Tasks(1).Error)
ParallelException with properties:
identifier: 'MATLAB:bang:SystemError'
message: 'Unexpected system error: bang: poll [4] Interrupted system call'
cause: {}
remotecause: {[1×1 MException]}
stack: [1×1 struct]
The cluster and job submission work fine as long as I don't make any system calls. Also, everything works fine with an older version of MATLAB (R2014b). The cluster is mostly RHEL 6.9 (some 7.4).
EDIT: I should clarify that sge_cluster is a parallel.cluster.Generic that submits jobs to a Sun Grid Engine scheduler. If I run the same job on parcluster('local'), the system call works just fine.
I guess I'm not the only one encountering this problem: 359992-system-call-bizarre-behavior, but it's not clear to me how to apply that answer.

回答(1 个)

Shashank
Shashank 2018-2-12
Hi Kevin,
The solution mentioned in the System call bizarre behavior link should work for you.
Sourcing the bash_profile file means that you should execute the following command in the terminal prior to calling MATLAB:
source ~/.bash_profile
or as mentioned in the example there you can specify the shell name explicitly while ssh.
Hope this helps.
-Shashank
  1 个评论
Kevin
Kevin 2018-2-12
Hi Shashank,
Including the bash_profile didn't change anything for me. I also tried including the "-nodisplay -nodesktop -noFigureWindows" flags, as mentioned in that answer.
Here's is my updated independentJobWrapper.sh:
#!/bin/sh
# This wrapper script is intended to support independent execution.
#
# This script uses the following environment variables set by the submit MATLAB code:
# MDCE_MATLAB_EXE - the MATLAB executable to use
# MDCE_MATLAB_ARGS - the MATLAB args to use
#
# Copyright 2010-2011 The MathWorks, Inc.
echo "Sourcing the bash profile"
source ~/.bash_profile
echo "Executing: ${MDCE_MATLAB_EXE} ${MDCE_MATLAB_ARGS}"
exec "${MDCE_MATLAB_EXE}" ${MDCE_MATLAB_ARGS}
and the log output:
Sourcing the bash profile
Executing: /sw/matlab/R2017a/bin/worker -nodisplay -nodesktop -noFigureWindows
< M A T L A B (R) >
Copyright 1984-2017 The MathWorks, Inc.
R2017a (9.2.0.538062) 64-bit (glnxa64)
February 23, 2017
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
2018-02-12 10:53:42 | About to evaluate task with DistcompEvaluateFileTask
2018-02-12 10:53:42 | Enter distcomp_evaluate_filetask_core
2018-02-12 10:53:42 | Enter distcomp_evaluate_filetask_core/iSetup
2018-02-12 10:53:42 | This process will exit on any fault.
2018-02-12 10:53:42 | This process will exit when its parent process dies.
2018-02-12 10:53:42 | About to call decode function.
2018-02-12 10:53:42 | In parallel.cluster.generic.independentDecodeFcn
2018-02-12 10:53:44 | Setting the desktop client to a new client with username
2018-02-12 10:53:44 | About to construct the storage object using constructor "makeFileStorageObject" and location "PC{}:UNIX{/matlabjobs}:"
2018-02-12 10:53:47 | About to find job and task using locations "Job2" and "Job2/Task1"
2018-02-12 10:53:49 | Setting the TaskEvaluator to the NullEvaluator
2018-02-12 10:53:49 | Setting number of computational threads to 1.
2018-02-12 10:53:49 | MATLAB Drive Enabled 0
2018-02-12 10:53:49 | Completed pre-execution phase
2018-02-12 10:53:49 | About to pPreJobEvaluate
2018-02-12 10:53:51 | About to pPreTaskEvaluate
2018-02-12 10:53:51 | About to add job dependencies
2018-02-12 10:53:51 | > JobPathHelper.addAdditionalPaths
2018-02-12 10:53:51 | > JobPathHelper.getPathsToAdd
2018-02-12 10:53:51 | < JobPathHelper.getPathsToAdd ~isMATLABDriveEnabledOnWorker
2018-02-12 10:53:51 | Not adding path dependencies as there is no change required to the path.
2018-02-12 10:53:51 | < JobPathHelper.addAdditionalPaths
2018-02-12 10:53:51 | Calling clear('functions'), and closing simulink models
2018-02-12 10:53:52 | About to call jobStartup
2018-02-12 10:53:52 | About to call taskStartup
2018-02-12 10:53:52 | About to get evaluation data
2018-02-12 10:53:52 | Begin task function
2018-02-12 10:53:53 | End task function
2018-02-12 10:53:53 | dctEvaluateFunctionArray calling: @()taskPostFcn(runprop.TaskEvaluator) with args
2018-02-12 10:53:53 | About to call taskFinish
2018-02-12 10:53:53 | dctEvaluateFunctionArray done.
2018-02-12 10:53:53 | dctEvaluateFunctionArray calling: iFinishTask with args
2018-02-12 10:53:53 | About to call pPostTaskEvaluate
2018-02-12 10:53:54 | About to call pPostJobEvaluate
2018-02-12 10:53:54 | dctEvaluateFunctionArray done.
2018-02-12 10:53:54 | dctEvaluateFunctionArray calling: removeDirectory with args
2018-02-12 10:53:54 | dctEvaluateFunctionArray done.
2018-02-12 10:53:54 | dctEvaluateFunctionArray calling: removeDirectory with args
2018-02-12 10:53:54 | dctEvaluateFunctionArray done.
2018-02-12 10:53:54 | dctEvaluateFunctionArray calling: iExitFunction with args
2018-02-12 10:53:54 | About to exit MATLAB normally
2018-02-12 10:53:54 | About to exit with code: 0
Perhaps you could explain what the "poll [4] Interrupted system call" error means, or how the behavior in R2017a has changed from previous releases? That might help me debug things from my end.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 MATLAB Parallel Server 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by