Reported Task State not accurate when running on MS HPC grid
1 次查看(过去 30 天)
显示 更早的评论
Hi,
We are using Matlab 2018a with the parallel toolbox in conjuction with a Matlab parallel server leveraging MS HPC Server 2012 as the scheduler. We've noticed when trying to retrieve task states using the following construct that it is common for incorrect states to be returned:
obj.Job.Tasks.State
For example, when we first start a job it will report pending, then briefly switch to failed before accurately report as running. Are there any tricks to getting these task states to be reported properly?
Thanks for any help.
0 个评论
采纳的回答
Edric Ellis
2019-8-20
Unfortunately, getting accurate state information back from the cluster can be tricky. This is because there are multiple sources of information relating to this - there's the "JobX/TaskY.state.mat" files on disk in your JobStorageLocation. These are created in state pending, the client moves them to queued on submission, and then the worker MATLAB processes set them to be running, and finally finished. There's also the information coming back from querying the underlying scheduling system. These pieces of information can occasionally (and usually transiently) conflict with each other, which leads to spurious states being observed. (It is necessary to query the underlying scheduling system to deal with the case where the worker MATLAB crashes before it gets to set the state file to running or finished.)
If you can, I would recommend using Job.wait as your primary means of waiting for results to become available. (Perhaps with the timeout parameter). This method ought to be more reliable than querying the task State properties directly, as it performs more detailed (and more expensive) checks.
0 个评论
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 MATLAB Parallel Server 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!