CUDA_ERROR_ILLEGAL_ADDRESS when runnin Faster R-CNN on Matlab 2018b
1 次查看(过去 30 天)
显示 更早的评论
I'm running faster R-CNN in `matlab 2018b` on a Windows 10. I face an exception `CUDA_ERROR_ILLEGAL_ADDRESS` when I increase the number of my training items or when I increase the `MaxEpoch`.
Below are the information of my `gpuDevice`
CUDADevice with properties:
Name: 'GeForce GTX 1050'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 9.2000
ToolkitVersion: 9.1000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 4.2950e+09
AvailableMemory: 3.4635e+09
MultiprocessorCount: 5
ClockRateKHz: 1493000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
And this is my code
latest_index =0;
for i=1:6
load (strcat('newDataset', int2str(i), '.mat'));
len =length(vehicleDataset.imageFilename);
for j=1:len
filename = vehicleDataset.imageFilename{j};
latest_index=latest_index+1;
fulldata.imageFilename{latest_index} = filename;
fulldata.vehicle{latest_index} = vehicleDataset.vehicle{j};
end
end
trainingDataTable = table(fulldata.imageFilename', fulldata.vehicle');
trainingDataTable.Properties.VariableNames = {'imageFilename','vehicle'};
data.trainingDataTable = trainingDataTable;
trainingDataTable(1:4,:)
% Split data into a training and test set.
idx = floor(0.6 * height(trainingDataTable));
trainingData = trainingDataTable(1:idx,:);
testData = trainingDataTable(idx:end,:);
% Create image input layer.
inputLayer = imageInputLayer([32 32 3]);
% Define the convolutional layer parameters.
filterSize = [3 3];
numFilters = 64;
% Create the middle layers.
middleLayers = [
convolution2dLayer(filterSize, numFilters, 'Padding', 1)
reluLayer()
convolution2dLayer(filterSize, numFilters, 'Padding', 1)
reluLayer()
maxPooling2dLayer(3, 'Stride',2)
];
finalLayers = [
fullyConnectedLayer(128)
% Add a ReLU non-linearity.
reluLayer()
fullyConnectedLayer(width(trainingDataTable))
% Add the softmax loss layer and classification layer.
softmaxLayer()
classificationLayer()
];
layers = [
inputLayer
middleLayers
finalLayers
];
% Options for step 1.
optionsStage1 = trainingOptions('sgdm', ...
'MaxEpochs', 2, ...
'MiniBatchSize', 1, ...
'InitialLearnRate', 1e-3, ...
'CheckpointPath', tempdir);
% Options for step 2.
optionsStage2 = trainingOptions('sgdm', ...
'MaxEpochs', 2, ...
'MiniBatchSize', 1, ...
'InitialLearnRate', 1e-3, ...
'CheckpointPath', tempdir);
% Options for step 3.
optionsStage3 = trainingOptions('sgdm', ...
'MaxEpochs', 2, ...
'MiniBatchSize', 1, ...
'InitialLearnRate', 1e-3, ...
'CheckpointPath', tempdir);
% Options for step 4.
optionsStage4 = trainingOptions('sgdm', ...
'MaxEpochs', 2, ...
'MiniBatchSize', 1, ...
'InitialLearnRate', 1e-3, ...
'CheckpointPath', tempdir);
options = [
optionsStage1
optionsStage2
optionsStage3
optionsStage4
];
doTrainingAndEval = true;
if doTrainingAndEval
% Set random seed to ensure example training reproducibility.
rng(0);
% Train Faster R-CNN detector. Select a BoxPyramidScale of 1.2 to allow
% for finer resolution for multiscale object detection.
detector = trainFasterRCNNObjectDetector(trainingData, layers, options, ...
'NegativeOverlapRange', [0 0.3], ...
'PositiveOverlapRange', [0.6 1], ...
'BoxPyramidScale', 1.2);
data.detector= detector;
else
% Load pretrained detector for the example.
detector = data.detector;
end
save mix_data data
if doTrainingAndEval
% Run detector on each image in the test set and collect results.
resultsStruct = struct([]);
for i = 1:height(testData)
% Read the image.
I = imread(testData.imageFilename{i});
% Run the detector.
[bboxes, scores, labels] = detect(detector, I);
% Collect the results.
resultsStruct(i).Boxes = bboxes;
resultsStruct(i).Scores = scores;
resultsStruct(i).Labels = labels;
end
% Convert the results into a table.
results = struct2table(resultsStruct);
data.results = results;
save mix_data data
else
% Load results from disk.
results = data.results;
end
% Extract expected bounding box locations from test data.
expectedResults = testData(:, 2:end);
% Evaluate the object detector using Average Precision metric.
[ap, recall, precision] = evaluateDetectionPrecision(results, expectedResults);
% Plot precision/recall curve
figure
plot(recall,precision)
xlabel('Recall')
ylabel('Precision')
grid on
title(sprintf('Average Precision = %.2f', ap))
First it prints the warning multiple time and throws the below exception
> Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
> In trainFasterRCNNObjectDetector (line 320)
In rcnn_trail (line 184)
> Error using -
An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
> Error in vision.internal.cnn.layer.SmoothL1Loss/backwardLoss (line 156)
idx = (X > -one) & (X < one);
> Error in nnet.internal.cnn.DAGNetwork/computeGradientsForTraining/efficientBackProp (line 585)
dLossdX = thisLayer.backwardLoss( ...
> Error in nnet.internal.cnn.DAGNetwork>@()efficientBackProp(i) (line 661)
@() efficientBackProp(i), ...
> Error in nnet.internal.cnn.util.executeWithStagedGPUOOMRecovery (line 11)
[ varargout{1:nOutputs} ] = computeFun();
> Error in nnet.internal.cnn.DAGNetwork>iExecuteWithStagedGPUOOMRecovery (line 1195)
[varargout{1:nargout}] = nnet.internal.cnn.util.executeWithStagedGPUOOMRecovery(varargin{:});
> Error in nnet.internal.cnn.DAGNetwork/computeGradientsForTraining (line 660)
theseGradients = iExecuteWithStagedGPUOOMRecovery( ...
> Error in nnet.internal.cnn.Trainer/computeGradients (line 184)
[gradients, predictions, states] = net.computeGradientsForTraining(X, Y,
needsStatefulTraining, propagateState);
> Error in nnet.internal.cnn.Trainer/train (line 85)
[gradients, predictions, states] = this.computeGradients(net, X, response,
needsStatefulTraining, propagateState);
> Error in vision.internal.cnn.trainNetwork (line 47)
trainedNet = trainer.train(trainedNet, trainingDispatcher);
> Error in fastRCNNObjectDetector.train (line 190)
[network, info] = vision.internal.cnn.trainNetwork(ds, lgraph, opts, mapping,
checkpointSaver);
> Error in trainFasterRCNNObjectDetector (line 410)
[stage2Detector, fastRCNN, ~, info(2)] = fastRCNNObjectDetector.train(trainingData, fastRCNN,
options(2), iStageTwoParams(params), checkpointSaver);
> Error in rcnn_trail (line 184)
detector = trainFasterRCNNObjectDetector(trainingData, layers, options, ...
4 个评论
Walter Roberson
2018-11-10
That should be okay. You could experiment with CUDA 10 driver instead, as long as you are not using cudamex() or GPU Coder.
回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Image Data Workflows 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!