Error executing of the example code for training a custom Mask R-CNN using cocodataset 2014

4 次查看(过去 30 天)
I followed the instructions in "Instance Segmentation Using Mask R-CNN Deep Learning" (ref[1]).
All the code worked perfectly until the last section "Train network" (ref[2]).
iteration = 1;
start = tic;
% Create subplots for the learning rate and mini-batch loss
fig = figure;
[lossPlotter] = helper.configureTrainingProgressPlotter(fig);
% Initialize verbose output
helper.initializeVerboseOutput([]);
% Custom training loop
for epoch = 1:numEpochs
reset(mbqTrain)
shuffle(mbqTrain)
while hasdata(mbqTrain)
% Get next batch from minibatchqueue
[X,gtBox,gtClass,gtMask] = next(mbqTrain);
% Evaluate the model gradients and loss using dlfeval
[gradients,loss,state] = dlfeval(@networkGradients,X,gtBox,gtClass,gtMask,dlnet,params);
dlnet.State = state;
% Compute the learning rate for the current iteration
learnRate = initialLearnRate/(1 + decay*iteration);
if(~isempty(gradients) && ~isempty(loss))
[dlnet.Learnables,velocity] = sgdmupdate(dlnet.Learnables,gradients,velocity,learnRate,momentum);
else
continue;
end
helper.displayVerboseOutputEveryEpoch(start,learnRate,epoch,iteration,loss);
% Plot loss/accuracy metric
D = duration(0,0,toc(start),'Format','hh:mm:ss');
addpoints(lossPlotter,numdetectMaskRCNN,Iteration,double(gather(extractdata(loss))))
subplot(2,1,2)
title(strcat("Epoch: ",num2str(epoch),", Elapsed: "+string(D)))
drawnow
iteration = iteration + 1;
end
end
net = dlnet;
% Save the trained network
modelDateTime = string(datetime('now','Format',"yyyy-MM-dd-HH-mm-ss"));
save(strcat("trainedMaskRCNN-",modelDateTime,"-Epoch-",num2str(numEpochs),".mat"),'net');
First, there is no "numdetectMaskRCNN" predefined.
I simply deleted it and reexecuted the section. It then showes the following error:
Error using nnet.internal.cnn.dlnetwork/forward (line 239)
Layer 'bn2a_branch2a': Invalid input data. The value of 'Variance' is invalid. Expected input to be positive.
Error in nnet.internal.cnn.dlnetwork/CodegenOptimizationStrategy/propagateWithFallback (line 122)
[varargout{1:nargout}] = fcn(net, X, layerIndices, layerOutputIndices);
Error in nnet.internal.cnn.dlnetwork/CodegenOptimizationStrategy/forward (line 62)
[varargout{1:nargout}] = propagateWithFallback(strategy, functionSlot, @forward, net, X, layerIndices, layerOutputIndices);
Error in nnet.internal.cnn.dlnetwork/DefaultOptimizationStrategy/propagate (line 143)
[varargout{1:nargout}] = inferenceMethod(strategy.CodegenStrategyOriginal,...
Error in nnet.internal.cnn.dlnetwork/DefaultOptimizationStrategy/forward (line 77)
[varargout{1:nargout}] = propagate(strategy, net, X, ...
Error in dlnetwork/forward (line 503)
[varargout{1:nargout}] = strategy.forward(net.PrivateNetwork, x, layerIndices, layerOutputIndices);
Error in networkGradients (line 21)
[YRPNRegDeltas, proposal, YRCNNClass, YRCNNReg, YRPNClass, YMask, state] = forward(...
Error in deep.internal.dlfeval (line 18)
[varargout{1:nout}] = fun(x{:});
Error in dlfeval (line 41)
[varargout{1:nout}] = deep.internal.dlfeval(fun,varargin{:});
I am wondering if there is anything I misunderstood so that the code doesn't work for me.
It will be of great help if this could be figured out or fixed. Thank you!
  4 个评论
Claudia De Clemente
Hello, did you find a solution for the elevated computational cost? I am working with a self made dataset, a set of ca 8k images 256 x 256 x 3. I have estimated to need more than a week to complete 30 epochs, it's crazy...

请先登录,再进行评论。

采纳的回答

Yi-Ping Hsueh
Yi-Ping Hsueh 2021-3-29
(copied from my previous comment to myself...)
I figured out a solution to this issue from other resource.
The problem comes from the negative value returned by "state". The original code is as below:
[gradients,loss,state] = dlfeval(@networkGradients,X,gtBox,gtClass,gtMask,dlnet,params);
dlnet.State = state;
Replace the last line (dlnet.State = state;) with the followings to ensure that all values assigned to "dlnet.State" are positive.
idx = dlnet.State.Parameter == "TrainedVariance";
boundAwayFromZero = @(X) max(X, eps('single'));
dlnet.State(idx,:) = dlupdate(boundAwayFromZero, dlnet.State(idx,:));
This will make the code work then.
But then I am now facing another problem. The training process takes so much time (days), probably because the network is really huge. I thought my GPU should be good enough but it turns out that even setting the mini-batch size to 2 requires more memory on GPU than what I have. For now, only cpu is capable of performing such computation.
My GPU is as follows:
Name: 'GeForce GTX 1080'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 11.2000
ToolkitVersion: 11
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8.5899e+09
AvailableMemory: 7.4505e+09
MultiprocessorCount: 20
ClockRateKHz: 1771000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
Hope this information helps those who want to train their own mask R-CNN on MATLAB.

更多回答(1 个)

Aditya Patil
Aditya Patil 2021-3-29
I have brought the issue to the notice of the concerned developers. It might be fixed in any of the upcoming releases.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by