How to diagnose "Out of Memory during deserialization" when running on high core count

49 次查看(过去 30 天)
Hello MATLAB Community! I'm trying to figure out a problem with parallel computing at the moment. I'm running the built-in genetic algorithm on a high performance cluster. Now I'm running into the issue that if I assign a certain number of cores the code does not work anymore.
I get the proper reply when starting the parallel pool (Connected to 48 or 280 workers) but when using 280 workers I first get a couple of warnings like this:
[^HWarning: A worker aborted during execution of the parfor loop. The parfor loop
will now run again on the remaining workers.]^H
[^H> In parallel_function (line 599)
In fcnvectorizer (line 16)
In gaminlppenaltyfcn
In gapenalty
In makeState (line 64)
In galincon (line 17)
In gapenalty
In gaminlp
In ga (line 366)
and then it finally crashes with this:
{^HError using fcnvectorizer (line 16)
Out of Memory during deserialization
On my local machine, the same code runs fine and needs about 2 gigabytes of ram per worker. I have assigned 4.5 gigabytes per core on the cluster, so I don't think it's an actual memory issue. However, all solutions I found online regarding this error point to memory issues.
Any input is greatly appreciated.
Cheers.
  2 个评论
Zhenhao Gong
Zhenhao Gong 2023-11-19
Hello, I met the same problem as you, and MATLAB also gave me feedback of "Out of Memory during deser....". But by my calculations, my code uses up to 20GB of memory, and the maximum RAM allowed by the cluster is 168GB. In fact, code does use 160GB.
I think some matrix operations under parfor are wrong.

请先登录,再进行评论。

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 MATLAB Parallel Server 的更多信息

产品


版本

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by