Neural networks - CUDAKernel/setConstantMemory - the data supplied is too big for constant 'hintsD'
3 次查看(过去 30 天)
显示 更早的评论
On R2015a with Parallel Computing Toolbox and Neural Network Toolbox.
Using the following code with GPU Nvidia GeForce GTX980 Ti:
net1 = feedforwardnet(20);
net1.trainFcn = 'trainscg';
x = inputs(1:4284,2:2000)'; % if I reduce this to 2:1900, it will work
t = double(targets'); % casting to double for GPU
t = t(:,1:4284);
% preparing for GPU xg = nndata2gpu(x); tg = nndata2gpu(t);
net1.input.processFcns = {'mapminmax'}; net1.output.processFcns = {'mapminmax'};
net2 = configure(net1,x,t); % Configure with MATLAB arrays
net2 = train(net2,xg,tg);
As you can see, this is not a big dataset. When I run this, it generates this error:
Error using parallel.gpu.CUDAKernel/setConstantMemory The data supplied is too big for constant 'hintsD'.
Error in nnGPU.codeHints (line 33) setConstantMemory(hints.yKernel,'hintsD',hints.double);
Error in nncalc.setup2 (line 13) calcHints = calcMode.codeHints(calcHints);
Error in nncalc.setup (line 17) [calcLib,calcNet] = nncalc.setup2(calcMode,calcNet,calcData,calcHints);
Error in network/train (line 357) [calcLib,calcNet,net,resourceText] = nncalc.setup(calcMode,net,data);
gpuDevice is showing this:
Name: 'GeForce GTX 980 Ti'
Index: 1
ComputeCapability: '5.2'
SupportsDouble: 1
DriverVersion: 8
ToolkitVersion: 6.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 6.4425e+09
AvailableMemory: 5.1520e+09
MultiprocessorCount: 22
ClockRateKHz: 1139500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
As noted in the code above, if I reduce x marginally, it will run.
I don't understand why data of this size would generate a memory error?
Am I missing a step in preparing this for GPU?
0 个评论
采纳的回答
Amanjit Dulai
2017-1-3
I was able to reproduce your issue. The best solution is to do the GPU training a different way by using the 'useGPU' flag. This does not use the shared memory in this way, and side-steps this issue. Your example code would look like this:
net1 = feedforwardnet(20);
net1.trainFcn = 'trainscg';
x = inputs(1:4284,2:2000)';
t = double(targets'); % casting to double for GPU
t = t(:,1:4284);
net1.input.processFcns = {'mapminmax'};
net1.output.processFcns = {'mapminmax'};
net1 = train(net1,x,t,'useGPU','yes');
0 个评论
更多回答(2 个)
Joss Knight
2016-12-19
Constant memory is a special fast read-only cache with 64KB of space. That's enough to store about 8000 elements of double-precision data. Perhaps you want to use shared memory, which will give you 16 or 48MB depending on your device configuration.
2 个评论
Joss Knight
2016-12-20
Sorry, I don't understand this code to know what hints.double is and why it needs to be so large. I'll see if I can get someone who knows to help you.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Sequence and Numeric Feature Data Workflows 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!