The parallel pool shut down because the client lost connection to worker

90 次查看(过去 30 天)
Related to the question asked in https://it.mathworks.com/matlabcentral/answers/2058094-aws-matlab-parallel-server-what-is-the-best-strategy i followed that strategy just like a PoC (28 Simulations - 28 Workers of 32 - 2 Machine of 16 Workers) but I encounter the following error:
"The parallel pool shut down because the client lost connection to worker 21. Check the network connection or restart the parallel pool with 'parpool'"
I don't understand why there's a newtork connection since MATLAB Simulink has been running on AWS Cloud.
Also, how can I continue execution on the remaining parallel workers even if an error occurs?
Thank you very much
  8 个评论
Damian Pietrus
Damian Pietrus 2024-1-3
Before we try accessing those logs, I have one more thing for you to try since the behavior seems to be similar to an issue I ran into with another user. After starting up your MATLAB client before running any jobs, run the following command:
setenv('MW_PCT_TRANSPORT_HEARTBEAT_INTERVAL', '600')
In this case, we are setting a communication timeout in the cluster to 600 seconds (10 minutes). Try running your code again and let me know how things go. If it's successful, I'll pass it along to our development team
Torben Ellegaard Lund
编辑:Torben Ellegaard Lund 2024-6-28
I had a smimilar problem on my MAcBook Air M2 running Sonoma 14.4.1 and MATLAB R2024a (24.1.0.2537033) where a parfor look kept chrashing. After using
setenv('MW_PCT_TRANSPORT_HEARTBEAT_INTERVAL', '600')
some improvement was seen but it still kept chrashing (although after longer time). I then used the following:
setenv('MW_PCT_TRANSPORT_HEARTBEAT_INTERVAL', '6000')
instead, and I have not experienced any chrashes since. The chrash used to happen both when storing data on a local SSD-harddrive and on an external SSH-harddrive connected via USB-C.

请先登录,再进行评论。

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by