- NVIDIA RTX 6000 Ada Generation | Total Memory: 51GB
- NVIDIA GeForce RTX 2080 SUPER | Total Memory: 8GB
gpuArray large sparse arrays. Error codes: "CUSPARSE_INTERNAL_ERROR" / "UNKNOWN_ERROR"
5 次查看(过去 30 天)
显示 更早的评论
I have 2 gpus: the first is a NVIDIA GeForce RTX 3090 Ti and the second is a NVIDIA GeForce RTX 2060 SUPER. I am running on a linux machine with NVIDIA Driver Version: 515.105.01 and CUDA Version: 11.7. I am using Matlab 2022b (update 7). When I create a large sparse gpuArray with the second gpu (smaller) there is no problem, when I repeat using the first gpu (larger) I get error code: UNKNOWN_ERROR or sometimes the error code is CUSPARSE_INTERNAL_ERROR
sample code:
%% Device 2 - small sparse array - no problem
gpuDevice(2)
a = speye(100000,100000);
a = gpuArray(a);
%% Device 2 - large sparse array - no problem
gpuDevice(2)
a = speye(10000000,10000000);
a = gpuArray(a);
%% Device 1 - small sparse array - no problem
gpuDevice(1)
a = speye(100000,100000);
a = gpuArray(a);
%% Device 1 - large sparse array - problem!!
gpuDevice(1)
a = speye(10000000,10000000);
a = gpuArray(a);
Error:
Error using gpuArray
An unexpected error occurred on the device. The error code was: UNKNOWN_ERROR.
Error in gpuTest (line 19)
a = gpuArray(a);
Device 2 is smaller (8 GB) so shouldn't be able to handle larger arrays. Here is it's details:
Name: 'NVIDIA GeForce RTX 2060 SUPER'
Index: 2
ComputeCapability: '7.5'
SupportsDouble: 1
DriverVersion: 11.7000
ToolkitVersion: 11.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152 (49.15 KB)
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8370192384 (8.37 GB)
AvailableMemory: 7891910656 (7.89 GB)
MultiprocessorCount: 34
ClockRateKHz: 1680000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
and here are the details of gpu that crashes (the larger one)
Name: 'NVIDIA GeForce RTX 3090 Ti'
Index: 1
ComputeCapability: '8.6'
SupportsDouble: 1
DriverVersion: 11.7000
ToolkitVersion: 11.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152 (49.15 KB)
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 25431965696 (25.43 GB)
AvailableMemory: 24284102656 (24.28 GB)
MultiprocessorCount: 84
ClockRateKHz: 1950000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
Any help or explanation would be much appreciated.
0 个评论
回答(2 个)
Ayush
2023-12-26
编辑:Ayush
2023-12-26
I understand that you are getting the errors when you create a large sparse gpuArray with the first GPU, having higher specifications, and not getting any errors when using second gpu, with lower specifications. I tired to reproduce the issue with my 2 GPUs:
The error was reproducible only untill R2022b. From R2023a MATLAB release onwards the error has been fixed.
Thanks
Ayush Jaiswal
Joss Knight
2024-1-3
Hi Joseph. It's hard to be definitive. There were some problems with cusparse and also Windows drivers when supporting the newest GeForce Ampere cards with CUDA 11.2, but I believed this to be fixed in R2023b and CUDA 11.8. Can you raise a support ticket and follow this up with MathWorks Support? Thanks.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 GPU Computing in MATLAB 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!