Parfor Error: lost connection

6 次查看(过去 30 天)
George
George 2012-11-7
Dear all,
I am using parfor-loop in my script. the script works fine on my own computer with matlab 2012a. but when I run the script on another computer with matlab 2011b, it gives me error:
Error using parallel_function (line 598) The session that parfor is using has shut down
Error in Myscript (line 97) parfor k=1:px
The client lost connection to lab 6. This might be due to network problems, or the interactive matlabpool job might have errored. This is causing: java.io.IOException: An existing connection was forcibly closed by the remote host
Anyone can give me any clue about this problem?
thank you in advance.
Kind regards George
  2 个评论
Walter Roberson
Walter Roberson 2012-11-7
Are both machines running 32 bit or both 64 bit?
Are you attempting to transfer more than 2 Gb of data?
George
George 2012-11-8
Dear Walter,
Both are 64bit matlab running on windows7 64bit with 24Gb Ram.
Dataset is around 1.5Gb
Thank you.
Rgards,
George

请先登录,再进行评论。

回答(2 个)

Jason Ross
Jason Ross 2012-11-7
There are a number of logs you can look at to try and gain some insight. They are located (by default) either in /var/log/mdce on Linux/Mac, and %TEMP%\MDCE\log on Windows. This might tell you what was going on.
Since the connection was "forcibly closed", that could be the result of something at the OS level, and you could take a look at the system logs / event viewer for any clues as to what might be going on.
Without reviewing the logs, though, there's not a lot to go on. There are many reasons something could forcibly shut down.
You can also try running a validation of the cluster (Parallel, Manage, select cluster profile, validate) or run the connectivity tests in Admin Center (matlabroot/toolbox/distcomp/bin/admincenter) to see if there's something off with respect to your setup.
Note I'm assuming that you are using MDCE. It would also be helpful if you could list what OS you are on, too.
  4 个评论
George
George 2012-11-12
yes, I am using local scheduler, and not job manager.
I tried to run the script on 2012b. it's strange that the first two times run successfuly, but the third time gives me the similar error.
The client lost connection to lab 4. This might be due to network problems, or the interactive matlabpool job might have errored.
I found MatlabDesktopCreateError.log in the AppData\Local\Temp, but it's creadted in Septemember.
any suggestion?
thank you.
Jason Ross
Jason Ross 2012-11-19
I was out of town for a little while -- unfortunately I don't have much of a general suggestion. You might want to contact support, as it might be related to your unique situation somehow.

请先登录,再进行评论。


Francisco
Francisco 2013-2-5
编辑:Francisco 2013-2-5
May be you are not working completely within the MATLAB environment. Like for example, you are using the system environment while invoking results from a 32bit application run in parallel which executes outside MATLAB for 64bit.
If so, the solution could be export to that application a lesser amount of data to not be always working nearly around the fine limits of the allowed memory usable by that application; even if it worked for two or three cycles, with a huge amount of data in the memory, a tiny increment in the working memory could perturb the task executed outside MATLAB.

类别

Help CenterFile Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by