gpuArray large sparse arrays. Error codes: "CUSPARSE_​INTERNAL_E​RROR" / "UNKNOWN_ERROR"

5 次查看(过去 30 天)
I have 2 gpus: the first is a NVIDIA GeForce RTX 3090 Ti and the second is a NVIDIA GeForce RTX 2060 SUPER. I am running on a linux machine with NVIDIA Driver Version: 515.105.01 and CUDA Version: 11.7. I am using Matlab 2022b (update 7). When I create a large sparse gpuArray with the second gpu (smaller) there is no problem, when I repeat using the first gpu (larger) I get error code: UNKNOWN_ERROR or sometimes the error code is CUSPARSE_INTERNAL_ERROR
sample code:
%% Device 2 - small sparse array - no problem
gpuDevice(2)
a = speye(100000,100000);
a = gpuArray(a);
%% Device 2 - large sparse array - no problem
gpuDevice(2)
a = speye(10000000,10000000);
a = gpuArray(a);
%% Device 1 - small sparse array - no problem
gpuDevice(1)
a = speye(100000,100000);
a = gpuArray(a);
%% Device 1 - large sparse array - problem!!
gpuDevice(1)
a = speye(10000000,10000000);
a = gpuArray(a);
Error:
Error using gpuArray
An unexpected error occurred on the device. The error code was: UNKNOWN_ERROR.
Error in gpuTest (line 19)
a = gpuArray(a);
Device 2 is smaller (8 GB) so shouldn't be able to handle larger arrays. Here is it's details:
Name: 'NVIDIA GeForce RTX 2060 SUPER'
Index: 2
ComputeCapability: '7.5'
SupportsDouble: 1
DriverVersion: 11.7000
ToolkitVersion: 11.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152 (49.15 KB)
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8370192384 (8.37 GB)
AvailableMemory: 7891910656 (7.89 GB)
MultiprocessorCount: 34
ClockRateKHz: 1680000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
and here are the details of gpu that crashes (the larger one)
Name: 'NVIDIA GeForce RTX 3090 Ti'
Index: 1
ComputeCapability: '8.6'
SupportsDouble: 1
DriverVersion: 11.7000
ToolkitVersion: 11.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152 (49.15 KB)
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 25431965696 (25.43 GB)
AvailableMemory: 24284102656 (24.28 GB)
MultiprocessorCount: 84
ClockRateKHz: 1950000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
Any help or explanation would be much appreciated.

回答(2 个)

Ayush
Ayush 2023-12-26
编辑:Ayush 2023-12-26
I understand that you are getting the errors when you create a large sparse gpuArray with the first GPU, having higher specifications, and not getting any errors when using second gpu, with lower specifications. I tired to reproduce the issue with my 2 GPUs:
  1. NVIDIA RTX 6000 Ada Generation | Total Memory: 51GB
  2. NVIDIA GeForce RTX 2080 SUPER | Total Memory: 8GB
The error was reproducible only untill R2022b. From R2023a MATLAB release onwards the error has been fixed.
Thanks
Ayush Jaiswal

Joss Knight
Joss Knight 2024-1-3
Hi Joseph. It's hard to be definitive. There were some problems with cusparse and also Windows drivers when supporting the newest GeForce Ampere cards with CUDA 11.2, but I believed this to be fixed in R2023b and CUDA 11.8. Can you raise a support ticket and follow this up with MathWorks Support? Thanks.

类别

Help CenterFile Exchange 中查找有关 GPU Computing in MATLAB 的更多信息

产品


版本

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by