MATLAB parallel computing beyond 8 cores causing OS to crash

25 次查看(过去 30 天)
Im running on a linux box using PoP-OS. It has 24 cores avaiable - we specifcally got it to run parallel toolbox MATLAB. When I set the parallel toolbox profiler to test and verfiy anything more than 8 cores - the computer will randomly select a core that causes the entire operating system to crash. Even if I select 10 out of 24 cores to use, the CPU can get a hard restart. There appears to be some kind of bug in the MATLAB parallel computing toolbox.
Any thoughts on what I should look into fixing this?
  2 个评论
Damian Pietrus
Damian Pietrus 2024-3-28
Hey Cyrus,
I have a few questions to hopefully provided some more context. Could you let me know what version of MATLAB you're using, as well as the type of processor you have? I'm specifically interested if it has performance/efficiency cores. I'm also curious when this issue occurs. Is it right when you open a pool of 8+ workers, or when you're running specific code?
You can also reach out to our support team at support@mathworks.com. They can help look at any crash logs.
Cyrus Abdollahi
Cyrus Abdollahi 2024-3-28
编辑:Cyrus Abdollahi 2024-3-28
Hi Damian,
[1] The problem is reproducible when I click on the bottom left corner of MATLAB and go to: "Parallel Preferences".
[2] From there, I click on the "Cluster Profile Manger" button.
[3] A new dialog box opens. From there I click on "Edit" at the bottom right and set "Number of workers to start on your local machine NumWorkers" field from 8 to 10.
[4] Click "Done"
[5] Now I click on the "Validation" tab to the right of "Properties"
[6] I set "Number of workers to use" to 10
[7] Click "Validate" -> sometimes it will work. Sometimes it will crash my entire OS.
Below is the output of the lscpu command: if you need more specific information on the cores let me know which linux command you want me to run and I can share the output with you.
~$ lscpu --all --extended
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ MHZ
0 0 0 0 0:0:0:0 yes 5700.0000 800.0000 1100.009
1 0 0 0 0:0:0:0 yes 5700.0000 800.0000 800.000
2 0 0 1 4:4:1:0 yes 5700.0000 800.0000 1093.296
3 0 0 1 4:4:1:0 yes 5700.0000 800.0000 800.000
4 0 0 2 8:8:2:0 yes 5700.0000 800.0000 800.000
5 0 0 2 8:8:2:0 yes 5700.0000 800.0000 800.833
6 0 0 3 12:12:3:0 yes 5700.0000 800.0000 800.000
7 0 0 3 12:12:3:0 yes 5700.0000 800.0000 1079.310
8 0 0 4 16:16:4:0 yes 6000.0000 800.0000 800.000
9 0 0 4 16:16:4:0 yes 6000.0000 800.0000 800.000
10 0 0 5 20:20:5:0 yes 6000.0000 800.0000 800.000
11 0 0 5 20:20:5:0 yes 6000.0000 800.0000 800.000
12 0 0 6 24:24:6:0 yes 5700.0000 800.0000 1100.021
13 0 0 6 24:24:6:0 yes 5700.0000 800.0000 800.000
14 0 0 7 28:28:7:0 yes 5700.0000 800.0000 800.000
15 0 0 7 28:28:7:0 yes 5700.0000 800.0000 800.000
16 0 0 8 32:32:8:0 yes 4400.0000 800.0000 800.000
17 0 0 9 33:33:8:0 yes 4400.0000 800.0000 1081.694
18 0 0 10 34:34:8:0 yes 4400.0000 800.0000 800.000
19 0 0 11 35:35:8:0 yes 4400.0000 800.0000 800.000
20 0 0 12 36:36:9:0 yes 4400.0000 800.0000 800.000
21 0 0 13 37:37:9:0 yes 4400.0000 800.0000 800.000
22 0 0 14 38:38:9:0 yes 4400.0000 800.0000 1054.172
23 0 0 15 39:39:9:0 yes 4400.0000 800.0000 800.000
24 0 0 16 40:40:10:0 yes 4400.0000 800.0000 800.000
25 0 0 17 41:41:10:0 yes 4400.0000 800.0000 800.000
26 0 0 18 42:42:10:0 yes 4400.0000 800.0000 800.000
27 0 0 19 43:43:10:0 yes 4400.0000 800.0000 800.000
28 0 0 20 44:44:11:0 yes 4400.0000 800.0000 800.000
29 0 0 21 45:45:11:0 yes 4400.0000 800.0000 1062.136
30 0 0 22 46:46:11:0 yes 4400.0000 800.0000 800.000
31 0 0 23 47:47:11:0 yes 4400.0000 800.0000 800.000
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i9-14900K
CPU family: 6
Model: 183
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 1
Stepping: 1
CPU max MHz: 6000.0000
CPU min MHz: 800.0000
BogoMIPS: 6374.40
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts
acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art ar
ch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_f
req pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdc
m sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lah
f_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow fle
xpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdsee
d adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock
_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hw
p_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid mov
diri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabilit
ies
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 896 KiB (24 instances)
L1i: 1.3 MiB (24 instances)
L2: 32 MiB (12 instances)
L3: 36 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-31
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW
sequence
Srbds: Not affected
Tsx async abort: Not affected

请先登录,再进行评论。

回答(1 个)

Animesh
Animesh 2024-8-1
You can try disabling Just-In-Time (JIT) compilation in the Java startup options. To do this, create a "java.opts" file in the "matlabroot/bin/glnxa64" folder (where "matlabroot" is the MATLAB installation folder), if there isn't already a "java.opts" file present. Then, include the following option in the "java.opts" file:
-Xint
You can refer to the following MathWorks documentation for more information on Java Startup options:

类别

Help CenterFile Exchange 中查找有关 Programming 的更多信息

产品


版本

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by