GPU problem CUDA_ERROR_UNKNOWN

19 次查看(过去 30 天)
Peter
Peter 2017-1-4
移动Matt J 2023-3-30
I'm running a matlab simulation code using an iterative matrix equation solver. This solver is called on the GPU every few time steps in a time stepping loop. This goes well for some dozens of time steps (although the computations gradually slow down...) until the screen goes black for a short instant of time and the simulation crashes with the following error message:
Error using gpuArray/subsasgn
An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_UNKNOWN
After this, Matlab does not recognize the GPU device anymore: the command
gpuDevice
results in:
Error using gpuDevice (line 26)
An unexpected error occurred trying to retrieve CUDA device properties. The CUDA error was:
CUDA_ERROR_UNKNOWN
Restarting matlab is not sufficient to restore the GPU. Restarting the PC is.
I'm running matlab 2016b on windows 10, using an Nvidia TITAN X (Pascal) GPU with the newest driver installed.
Do the above symptoms inspire anyone for a diagnosis of this problem?
  4 个评论
Xubin Lin
Xubin Lin 2020-6-13
Dear Joss,
I also have the same problem.
An error occurred during PTX compilation of <image>.
The information log was:
The error log was:
The CUDA error code was: CUDA_ERROR_ILLEGAL_ADDRESS.
My output of gpuDevice is as follows(matlabR2019a and CUDA 10.2):
Name: 'GeForce GTX 1060'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 11
ToolkitVersion: 10
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 6.4425e+09
MultiprocessorCount: 10
ClockRateKHz: 1670500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
Swati Jain
Swati Jain 2023-3-30
移动:Matt J 2023-3-30
I'm facing this error while working on Deep Network Designer.
Please help me in solving this error.

请先登录,再进行评论。

采纳的回答

Peter
Peter 2017-1-6
Monitoring the GPU performance revealed that most probably the temperature is causing the issue: Slowing down of performance goes with rising of temperature and performance is capped by temperature.
Crash of the GPU occurred when GPU reached 95 degrees...
  2 个评论
Vaclav Bocek
Vaclav Bocek 2018-4-19
移动:Matt J 2023-3-30
How did you solved it please?
Peter
Peter 2018-4-23
移动:Matt J 2023-3-30
I solved it by: 1) a smarter placement of the GPU in the pc casing, allowing for better air-flow 2) change the behavior of the cooling fan: generally it only reacts to CPU activity. can be set in BIOS I believe. just made it blow a little harder. This is all very machine specific so it will take some investigating on your part to try these options.

请先登录,再进行评论。

更多回答(1 个)

Matt J
Matt J 2017-1-4
I've had symptoms like that before. Re-installing/updating the GPU driver fixed it for me, but it was never clear to me what the root cause was.
  1 个评论
Peter
Peter 2017-1-5
thanks Matt, I did install the latest drivers (several times now) hoping for it to solve the issue but unfortunately without success.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 GPU Computing 的更多信息

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by