Can parfor run a series of GPU programs simultaneously?
5 次查看(过去 30 天)
显示 更早的评论
parfor i = 1:9
% for i = 1:9
sim_e(i).ellipticity = i/10;
prop_output_elliptical(i) = GMMNLSE_propagate(fiber,initial_pulse,sim_e(i),gain_rate_eqn);
end
I am running a parfor code like above, and GMMNLSE_propagate is a function that runs on GPU.
The codes crashes, and below is the error report. Is it OK to run a series of GPU programs using parfor? Thank you for your help.
Starting parallel pool (parpool) using the 'Processes' profile ...
Preserving jobs with IDs: 1 because they contain crash dump files.
You can use 'delete(myCluster.Jobs)' to remove all jobs created with profile Processes. To create 'myCluster' use 'myCluster = parcluster('Processes')'.
Connected to parallel pool with 14 workers.
--------------------------------------------------------------------------------
Access violation detected at 2023-11-02 14:58:13 +0800
--------------------------------------------------------------------------------
Configuration:
Crash Decoding : Disabled - No sandbox or build area path
Crash Mode : continue (default)
Default Encoding : UTF-8
Deployed : false
Graphics Driver : Uninitialized hardware
Graphics card 1 : NVIDIA ( 0x10de ) NVIDIA GeForce RTX 3060 Laptop GPU Version 31.0.15.3713 (2023-8-14)
Interpreter 0 : Executing request: 64657461696C2F44656661756C744D564D2E637070
Java Version : Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
MATLAB Architecture : win64
MATLAB Entitlement ID : 4205217
MATLAB Root : D:\MATLAB
MATLAB Version : 9.14.0.2337262 (R2023a) Update 5
OpenGL : hardware
Operating System : Microsoft Windows 11 家庭中文版
Process ID : 18824
Processor ID : x86 Family 6 Model 154 Stepping 3, GenuineIntel
Session Key : f41c4c38-99fb-43f5-b9cb-be38129f8c29
Window System : Version 10.0 (Build 22621)
Fault Count: 1
Abnormal termination:
Access violation
Current Thread: 'MCR 0 interpreter thread' id 6000
Register State (from fault):
RAX = 0000000000000000 RBX = 0000022d9b00bc10
RCX = 0000022d9c846d30 RDX = 0000022d97b8c4d0
RSP = 0000005687fef490 RBP = 0000022d9c846d30
RSI = 0000022d0ed29c80 RDI = 0000000000000000
R8 = 0000022d3d219480 R9 = 0000000000000001
R10 = 0000000000000003 R11 = 0000005687fef430
R12 = 0000005687fef8c0 R13 = 0000005687fef928
R14 = 0000005687fef6c0 R15 = 0000022d97b8c4d0
RIP = 00007ffb643e1c51 EFL = 00010246
CS = 0033 FS = 0053 GS = 002b
Stack Trace (from fault):
[ 0] 0x00007ffb643e1c51 C:\Windows\system32\DriverStore\FileRepository\nvlti.inf_amd64_106be4074dc4b9cb\nvcuda64.dll+02104401
[ 1] 0x00007ffb643b58ef C:\Windows\system32\DriverStore\FileRepository\nvlti.inf_amd64_106be4074dc4b9cb\nvcuda64.dll+01923311
[ 2] 0x00007ffb643b5a36 C:\Windows\system32\DriverStore\FileRepository\nvlti.inf_amd64_106be4074dc4b9cb\nvcuda64.dll+01923638
[ 3] 0x00007ffb64288681 C:\Windows\system32\DriverStore\FileRepository\nvlti.inf_amd64_106be4074dc4b9cb\nvcuda64.dll+00689793
[ 4] 0x00007ffb642887d0 C:\Windows\system32\DriverStore\FileRepository\nvlti.inf_amd64_106be4074dc4b9cb\nvcuda64.dll+00690128
[ 5] 0x00007ffb64289359 C:\Windows\system32\DriverStore\FileRepository\nvlti.inf_amd64_106be4074dc4b9cb\nvcuda64.dll+00693081
[ 6] 0x00007ffa88341627 D:\MATLAB\bin\win64\cudart64_110.dll+00136743
--------------------------------------------------------------------------------
Access violation detected at 2023-11-02 14:58:13 +0800
--------------------------------------------------------------------------------
Configuration:
Crash Decoding : Disabled - No sandbox or build area path
Crash Mode : continue (default)
Default Encoding : UTF-8
Deployed : false
Graphics Driver : Uninitialized hardware
Graphics card 1 : NVIDIA ( 0x10de ) NVIDIA GeForce RTX 3060 Laptop GPU Version 31.0.15.3713 (2023-8-14)
Interpreter 0 : Executing request: 64657461696C2F44656661756C744D564D2E637070
Java Version : Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
MATLAB Architecture : win64
MATLAB Entitlement ID : 4205217
MATLAB Root : D:\MATLAB
MATLAB Version : 9.14.0.2337262 (R2023a) Update 5
OpenGL : hardware
Operating System : Microsoft Windows 11 家庭中文版
Process ID : 12036
Processor ID : x86 Family 6 Model 154 Stepping 3, GenuineIntel
Session Key : 1bc022d5-fec1-4d20-85e6-804c291b35f9
Window System : Version 10.0 (Build 22621)
Fault Count: 1
Abnormal termination:
Access violation
Current Thread: 'MCR 0 interpreter thread' id 1504
Register State (from fault):
RAX = 0000000000000000 RBX = 000001f997c03600
RCX = 000001f997a46fa0 RDX = 000001f9917c6dd0
RSP = 0000000b249ef680 RBP = 000001f997a46fa0
RSI = 000001f90ed6cd10 RDI = 0000000000000000
R8 = 000001f93d278940 R9 = 0000000000000001
R10 = 0000000000000003 R11 = 0000000b249ef620
R12 = 0000000b249efab0 R13 = 0000000b249efb18
R14 = 0000000b249ef8b0 R15 = 000001f9917c6dd0
RIP = 00007ffb643e1c51 EFL = 00010246
CS = 0033 FS = 0053 GS = 002b
Stack Trace (from fault):
[ 0] 0x00007ffb643e1c51 C:\Windows\system32\DriverStore\FileRepository\nvlti.inf_amd64_106be4074dc4b9cb\nvcuda64.dll+02104401
[ 1] 0x00007ffb643b58ef C:\Windows\system32\DriverStore\FileRepository\nvlti.inf_amd64_106be4074dc4b9cb\nvcuda64.dll+01923311
[ 2] 0x00007ffb643b5a36 C:\Windows\system32\DriverStore\FileRepository\nvlti.inf_amd64_106be4074dc4b9cb\nvcuda64.dll+01923638
[ 3] 0x00007ffb64288681 C:\Windows\system32\DriverStore\FileRepository\nvlti.inf_amd64_106be4074dc4b9cb\nvcuda64.dll+00689793
[ 4] 0x00007ffb642887d0 C:\Windows\system32\DriverStore\FileRepository\nvlti.inf_amd64_106be4074dc4b9cb\nvcuda64.dll+00690128
[ 5] 0x00007ffb64289359 C:\Windows\system32\DriverStore\FileRepository\nvlti.inf_amd64_106be4074dc4b9cb\nvcuda64.dll+00693081
[ 6] 0x00007ffa88341627 D:\MATLAB\bin\win64\cudart64_110.dll+00136743
错误使用 setup_stepping_kernel
Please set the GPU you're going to use by setting "sim.gpuDevice.Index".
出错 GMMNLSE_propagate_with_adaptive (第 213 行)
sim.cuda_MPA_psi_update] = setup_stepping_kernel(sim,Nt,num_modes);
出错 GMMNLSE_propagate (第 74 行)
foutput = GMMNLSE_propgation_func(fiber, initial_condition, sim, gain_rate_eqn);
出错 nonlinear_coupling (第 86 行)
parfor i = 1:9
警告: 2 worker(s) crashed while executing code in the current parallel pool. MATLAB may attempt to run the code
again on the remaining workers of the pool, unless an spmd block has run. View the crash dump files to determine
what caused the workers to crash.
0 个评论
采纳的回答
Walter Roberson
2023-11-2
GMMNLSE_propgation_func must currently contain an invocation of a Simulink model. You need to configure that to run with sim() if it does not already do so. You have to set the sim object parameter gpuDevice.Index to the index of the gpu device. You must not start more such workers than you have GPUs. Only one worker at a time can use any given GPU.
Unfortunately you cannot just use the parfor index as the gpu index. Tracking which gpu are available can be a bit of a nuisance.
7 个评论
Walter Roberson
2023-11-2
sim.gpuDevice.Index is not something that I can find any reference to at the moment.
更多回答(1 个)
Joss Knight
2024-1-3
It looks like you just have a bug in your CUDAKernel implementation, probably accessing unallocated memory. This is putting the device in a bad state for subsequent GPU execution. Try using the NVIDIA compute sanitiser application to debug it.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Parallel and Cloud 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!