Trainnet with parallel-CPU mode giving incorrect results

9 次查看(过去 30 天)
I'm using trainnet to train a convolutional regression network to find the X-Y centroid of a subtle gradient region in an input image. The training data consist of paired 130x326 grayscale images and ground-truth output coordinates. Both the RMSE and loss function reach very small numbers (eg 10^-3) after a few minutes of training on a smal dataset. The trained network gives the expected results when trained in single-CPU mode, but when trained in parallel-CPU mode, the predictions are significantly off. To attempt debugging, I scaled back to a very simple network, disabled normalization, and trained with only two datapoints--fully expecting it to memorize the training data perfectly. Using single-CPU training mode, the trained network yields perfect predictions (as expected) on the training data, but after using parallel-CPU mode, the trained network does not predict correctly on the training data. I added in a more verbose loss function and confirmed that the reported losses (i.e. showin in the loss function during training) are consistent with the (Y,T) pairs during training, and that the T values are being correctly read from the training data.
It seems perhaps the final outputted network in parallel-CPU mode does not correcltly capture the results of the training.
I'm running 2024a on a MBPro (M2 Max), using Apple Accelerate BLAS. (Default BLAS persistently crashed in parallel mode with trainnet.)
Code snippet below...
layers = [
imageInputLayer([130 326 1],"Name","imageinput","Normalization","none")
convolution2dLayer([10 10],8,"dilation",[2 2],"Name","conv_1")
maxPooling2dLayer([2 2],"Name","maxpool_4")
batchNormalizationLayer
reluLayer("Name","relu_1")
convolution2dLayer([2 2],16,"Name","conv_2")
fullyConnectedLayer(2,"Name","fc")];
opts = trainingOptions('sgdm', ...
'InitialLearnRate',1e-7, ...
'LearnRateSchedule','piecewise',...
'LearnRateDropPeriod',500,...
'LearnRateDropFactor',.25,...
'MaxEpochs',1000, ...
'Verbose',false, ...
'ExecutionEnvironment','parallel',...
'Shuffle','every-epoch',...
'Plots','training-progress', ...
'OutputNetwork','last-iteration');
FOVCnet = trainnet(trainingData,net,@modelLoss,opts);
function loss = modelLoss(Y,T) % define loss function
Y
T
loss = mse(Y,T)
end
  3 个评论
Matt J
Matt J 2024-5-25
We can't run the code without trainingData. Please attach your two data point test case in a .mat file (as an arrayDatastore).
Collin Rich
Collin Rich 2024-5-25
Here are the two test images and coordinates. (Sorry for not putting in an arrayDatastore; I'm not sure how to put both in a single arrayDatastore. Still learning the ropes...)

请先登录,再进行评论。

回答(0 个)

产品


版本

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by