- Are you on Windows or Linux
- Are you starting your pool on just one machine, or are you using Parallel Server across multiple nodes?
- Are you running Simulink simulations? If so, could you share some code on how you're submitting the jobs?
The parallel pool shut down because the client lost connection to worker
90 次查看(过去 30 天)
显示 更早的评论
Related to the question asked in https://it.mathworks.com/matlabcentral/answers/2058094-aws-matlab-parallel-server-what-is-the-best-strategy i followed that strategy just like a PoC (28 Simulations - 28 Workers of 32 - 2 Machine of 16 Workers) but I encounter the following error:
"The parallel pool shut down because the client lost connection to worker 21. Check the network connection or restart the parallel pool with 'parpool'"
I don't understand why there's a newtork connection since MATLAB Simulink has been running on AWS Cloud.
Also, how can I continue execution on the remaining parallel workers even if an error occurs?
Thank you very much
8 个评论
Damian Pietrus
2024-1-3
Before we try accessing those logs, I have one more thing for you to try since the behavior seems to be similar to an issue I ran into with another user. After starting up your MATLAB client before running any jobs, run the following command:
setenv('MW_PCT_TRANSPORT_HEARTBEAT_INTERVAL', '600')
In this case, we are setting a communication timeout in the cluster to 600 seconds (10 minutes). Try running your code again and let me know how things go. If it's successful, I'll pass it along to our development team
Torben Ellegaard Lund
2024-6-28
编辑:Torben Ellegaard Lund
2024-6-28
I had a smimilar problem on my MAcBook Air M2 running Sonoma 14.4.1 and MATLAB R2024a (24.1.0.2537033) where a parfor look kept chrashing. After using
setenv('MW_PCT_TRANSPORT_HEARTBEAT_INTERVAL', '600')
some improvement was seen but it still kept chrashing (although after longer time). I then used the following:
setenv('MW_PCT_TRANSPORT_HEARTBEAT_INTERVAL', '6000')
instead, and I have not experienced any chrashes since. The chrash used to happen both when storing data on a local SSD-harddrive and on an external SSH-harddrive connected via USB-C.
回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!