CUDA_ERROR​_ILLEGAL_A​DDRESS when runnin Faster R-CNN on Matlab 2018b

2 次查看(过去 30 天)
I'm running faster R-CNN in `matlab 2018b` on a Windows 10. I face an exception `CUDA_ERROR_ILLEGAL_ADDRESS` when I increase the number of my training items or when I increase the `MaxEpoch`.
Below are the information of my `gpuDevice`
CUDADevice with properties:
Name: 'GeForce GTX 1050'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 9.2000
ToolkitVersion: 9.1000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 4.2950e+09
AvailableMemory: 3.4635e+09
MultiprocessorCount: 5
ClockRateKHz: 1493000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
And this is my code
latest_index =0;
for i=1:6
load (strcat('newDataset', int2str(i), '.mat'));
len =length(vehicleDataset.imageFilename);
for j=1:len
filename = vehicleDataset.imageFilename{j};
latest_index=latest_index+1;
fulldata.imageFilename{latest_index} = filename;
fulldata.vehicle{latest_index} = vehicleDataset.vehicle{j};
end
end
trainingDataTable = table(fulldata.imageFilename', fulldata.vehicle');
trainingDataTable.Properties.VariableNames = {'imageFilename','vehicle'};
data.trainingDataTable = trainingDataTable;
trainingDataTable(1:4,:)
% Split data into a training and test set.
idx = floor(0.6 * height(trainingDataTable));
trainingData = trainingDataTable(1:idx,:);
testData = trainingDataTable(idx:end,:);
% Create image input layer.
inputLayer = imageInputLayer([32 32 3]);
% Define the convolutional layer parameters.
filterSize = [3 3];
numFilters = 64;
% Create the middle layers.
middleLayers = [
convolution2dLayer(filterSize, numFilters, 'Padding', 1)
reluLayer()
convolution2dLayer(filterSize, numFilters, 'Padding', 1)
reluLayer()
maxPooling2dLayer(3, 'Stride',2)
];
finalLayers = [
fullyConnectedLayer(128)
% Add a ReLU non-linearity.
reluLayer()
fullyConnectedLayer(width(trainingDataTable))
% Add the softmax loss layer and classification layer.
softmaxLayer()
classificationLayer()
];
layers = [
inputLayer
middleLayers
finalLayers
];
% Options for step 1.
optionsStage1 = trainingOptions('sgdm', ...
'MaxEpochs', 2, ...
'MiniBatchSize', 1, ...
'InitialLearnRate', 1e-3, ...
'CheckpointPath', tempdir);
% Options for step 2.
optionsStage2 = trainingOptions('sgdm', ...
'MaxEpochs', 2, ...
'MiniBatchSize', 1, ...
'InitialLearnRate', 1e-3, ...
'CheckpointPath', tempdir);
% Options for step 3.
optionsStage3 = trainingOptions('sgdm', ...
'MaxEpochs', 2, ...
'MiniBatchSize', 1, ...
'InitialLearnRate', 1e-3, ...
'CheckpointPath', tempdir);
% Options for step 4.
optionsStage4 = trainingOptions('sgdm', ...
'MaxEpochs', 2, ...
'MiniBatchSize', 1, ...
'InitialLearnRate', 1e-3, ...
'CheckpointPath', tempdir);
options = [
optionsStage1
optionsStage2
optionsStage3
optionsStage4
];
doTrainingAndEval = true;
if doTrainingAndEval
% Set random seed to ensure example training reproducibility.
rng(0);
% Train Faster R-CNN detector. Select a BoxPyramidScale of 1.2 to allow
% for finer resolution for multiscale object detection.
detector = trainFasterRCNNObjectDetector(trainingData, layers, options, ...
'NegativeOverlapRange', [0 0.3], ...
'PositiveOverlapRange', [0.6 1], ...
'BoxPyramidScale', 1.2);
data.detector= detector;
else
% Load pretrained detector for the example.
detector = data.detector;
end
save mix_data data
if doTrainingAndEval
% Run detector on each image in the test set and collect results.
resultsStruct = struct([]);
for i = 1:height(testData)
% Read the image.
I = imread(testData.imageFilename{i});
% Run the detector.
[bboxes, scores, labels] = detect(detector, I);
% Collect the results.
resultsStruct(i).Boxes = bboxes;
resultsStruct(i).Scores = scores;
resultsStruct(i).Labels = labels;
end
% Convert the results into a table.
results = struct2table(resultsStruct);
data.results = results;
save mix_data data
else
% Load results from disk.
results = data.results;
end
% Extract expected bounding box locations from test data.
expectedResults = testData(:, 2:end);
% Evaluate the object detector using Average Precision metric.
[ap, recall, precision] = evaluateDetectionPrecision(results, expectedResults);
% Plot precision/recall curve
figure
plot(recall,precision)
xlabel('Recall')
ylabel('Precision')
grid on
title(sprintf('Average Precision = %.2f', ap))
First it prints the warning multiple time and throws the below exception
> Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
> In trainFasterRCNNObjectDetector (line 320)
In rcnn_trail (line 184)
> Error using -
An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
> Error in vision.internal.cnn.layer.SmoothL1Loss/backwardLoss (line 156)
idx = (X > -one) & (X < one);
> Error in nnet.internal.cnn.DAGNetwork/computeGradientsForTraining/efficientBackProp (line 585)
dLossdX = thisLayer.backwardLoss( ...
> Error in nnet.internal.cnn.DAGNetwork>@()efficientBackProp(i) (line 661)
@() efficientBackProp(i), ...
> Error in nnet.internal.cnn.util.executeWithStagedGPUOOMRecovery (line 11)
[ varargout{1:nOutputs} ] = computeFun();
> Error in nnet.internal.cnn.DAGNetwork>iExecuteWithStagedGPUOOMRecovery (line 1195)
[varargout{1:nargout}] = nnet.internal.cnn.util.executeWithStagedGPUOOMRecovery(varargin{:});
> Error in nnet.internal.cnn.DAGNetwork/computeGradientsForTraining (line 660)
theseGradients = iExecuteWithStagedGPUOOMRecovery( ...
> Error in nnet.internal.cnn.Trainer/computeGradients (line 184)
[gradients, predictions, states] = net.computeGradientsForTraining(X, Y,
needsStatefulTraining, propagateState);
> Error in nnet.internal.cnn.Trainer/train (line 85)
[gradients, predictions, states] = this.computeGradients(net, X, response,
needsStatefulTraining, propagateState);
> Error in vision.internal.cnn.trainNetwork (line 47)
trainedNet = trainer.train(trainedNet, trainingDispatcher);
> Error in fastRCNNObjectDetector.train (line 190)
[network, info] = vision.internal.cnn.trainNetwork(ds, lgraph, opts, mapping,
checkpointSaver);
> Error in trainFasterRCNNObjectDetector (line 410)
[stage2Detector, fastRCNN, ~, info(2)] = fastRCNNObjectDetector.train(trainingData, fastRCNN,
options(2), iStageTwoParams(params), checkpointSaver);
> Error in rcnn_trail (line 184)
detector = trainFasterRCNNObjectDetector(trainingData, layers, options, ...
  4 个评论
Walter Roberson
Walter Roberson 2018-11-10
That should be okay. You could experiment with CUDA 10 driver instead, as long as you are not using cudamex() or GPU Coder.
Ghada Khaled
Ghada Khaled 2018-11-11
This is what I expected but still, I’m getting this error. Could the memory size be an issue? Since I only have 8 Gig of memory

请先登录,再进行评论。

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 GPU Computing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by