the complete error information shown in the first cluster:
Starting parallel pool (parpool) using the 'local' profile ...
Preserving jobs with IDs: 1 because they contain crash dump files.
You can use 'delete(myCluster.Jobs)' to remove all jobs created with profile local. To create 'myCluster' use 'myCluster = parcluster('local')'.
connected to 4 workers.
|========================================================================================|
| Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Base Learning |
| | | (hh:mm:ss) | RMSE | Loss | Rate |
|========================================================================================|
Error using trainNetwork (line 154)
The parallel pool that SPMD was using has been shut down.
Error in TrainMyUnet (line 19)
[net, info] = trainNetwork(trainSet, trainLabel, myUnet, options);
Error in tarinTask (line 10)
TrainMyUnet;
Caused by:
Error using nnet.internal.cnn.ParallelTrainer/train (line 67)
The parallel pool that SPMD was using has been shut down.
The client lost connection to worker 2. This might be due to network problems,
or the interactive communicating job might have errored.
the gpus in this cluster is telsa k80.