Random shuffling (with shuffle() command) does not work in combination with randomPatchExtractionDatastore-Objects and/or multi-gpu training
5 次查看(过去 30 天)
显示 更早的评论
I want to shuffle my partitioned randomPatchExtractionDatastore objects in "workerImds" randomly by using the "shuffle()" command like shown in the following lines:
iteration = 0;
spmd
% partition datastore
workerImds = partition(dsTrain,numWorkers,labindex);
workerImds.MiniBatchSize = workerMiniBatchSize(labindex);
% loop over epochs
for epoch = 1:options.MaxEpochs
% shuffle data every epoch
reset(workerImds)
workerImds = shuffle(workerImds);
% loop over mini-batches
while gop(@and,hasdata(workerImds))
iteration = iteration + 1;
% read mini-batch of data
[workerXYBatch,workerImdsInfo] = read(workerImds);
...
end
end
end
This is done according to the related documentation: https://de.mathworks.com/help/parallel-computing/train-network-in-parallel-with-custom-training-loop.html
But everytime starting a new training loop the shuffling results in the same order of indices, which can be seen when adding the following line to my while-loop and starting the training over and over again.
if iteration == 1 || mod(iteration-1,numIterationsPerEpoch) == 0
if labindex == 1
fprintf("\nEpoch: %2.0f Iteration: %4.0f WorkerIndices: %s",epoch,iteration,mat2str(workerImdsInfo.ImageIndices.'))
end
end
Lab 1:
Epoch: 1 Iteration: 1 WorkerIndices: [16 23 15 21 49]
Epoch: 2 Iteration: 11 WorkerIndices: [40 17 38 24 15]
Epoch: 3 Iteration: 21 WorkerIndices: [49 21 18 11 12]
Epoch: 4 Iteration: 31 WorkerIndices: [24 6 48 42 40]
Epoch: 5 Iteration: 41 WorkerIndices: [18 36 22 20 23]
This order of indices ([16 23 15 21 49] ...) is what I get, everytime doing the training. This is not what I would call random shuffling. This is more like a "static shuffling algorithm".
The same problem also occurs without multi-GPU training (without spmd block) when trying to randomly shuffle the normal (non-partitioned) randomPatchExtractionDatastore-Object "dsTrain".
However, the problem does not occur when shuffling an ImageDatastore object. In this case, as well as in the case of shuffling a minibatchqueue by using the shuffle() command, the random shuffling works very well. But unfortunately in this multi-GPU case I can neither use a minibatchqueue nor a normal imageDatastore object in my regression use case.
Looking forward to help and advice on this subject! Thank you!
0 个评论
回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Parallel and Cloud 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!