Matlab sometimes won't thread mldivide on Intel Core i9-10980XE

4 次查看(过去 30 天)
When I try to solve a (sparse) linear equation system K*x = f (originating from a system of PDE), I use mldivide (x=K\f). When I perform this on a PC workstation with Intel Core i9-9900K, Matlab (2019b) choose automatically to thread the process over all 8 available physical cores, as expected. However, when I try to solve this exact problem on a (custom built) workstation with Intel Core i9-10980XE (18 physical CPUs), Matlab (2020a) sometimes choose to not thread, and only use one core.
When I say 'sometimes', it really seems to be random. I can't figure out when it threads as expected, and when not.
However, with some help from the MathWorks Support, I have found out that I can force Matlab to thread, by doing one of two different things.
1) I can set environmental variable KMP_AFFINITY to 'norespect', which seems to make the system ignore the pre-selected "OS proc set", which by default seems to be only CPU 2. To be more specific, if I also set environmental variable KMP_SETTINGS = 1 and add 'verbose' to KMP_AFFINITY, the output reads:
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #155: KMP_AFFINITY: Initial OS proc set not respected: 2
OMP: Info #156: KMP_AFFINITY: 36 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 18 cores/pkg x 2 threads/core (18 total cores)
OMP: Info #250: KMP_AFFINITY: pid 16320 tid 18420 thread 0 bound to OS proc set 0-35
OMP: Info #250: KMP_AFFINITY: pid 16320 tid 18420 thread 1 bound to OS proc set 0-35
... and repeat the last line (Info #250) for every thread...
What's interesting is Info #155, which tells me that, for some reason, the system is bounded to only use CPU 2. However, I successfully overrule this by setting KMP_AFFINITY = 'norespect'. If I don't choose 'norespect', the default value 'respect' will be used and the output will be this:
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 2
OMP: Info #156: KMP_AFFINITY: 1 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #159: KMP_AFFINITY: 1 packages x 1 cores/pkg x 1 threads/core (1 total cores)
OMP: Info #250: KMP_AFFINITY: pid 14388 tid 1560 thread 0 bound to OS proc set 2
OMP: Info #250: KMP_AFFINITY: pid 14388 tid 1560 thread 1 bound to OS proc set 2
... and repeat the last line (Info #250) for every thread...
As a reference, when I do the same thing on the workstation with i9-9900K, and choose 'respect' (that is, I run it with default settings), I get the following output:
OMP: Info #211: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #209: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
OMP: Info #156: KMP_AFFINITY: 16 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 8 cores/pkg x 2 threads/core (8 total cores)
OMP: Info #249: KMP_AFFINITY: pid 8536 tid 7140 thread 0 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
OMP: Info #249: KMP_AFFINITY: pid 8536 tid 7140 thread 1 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
2) Another way of utilizing all physical CPUs is by opening Task Manager > Details, and right-click on matlab.exe and selecting 'Set affinity'. All CPUs are already pre-selected, even when only one CPU is working. However, it takes no more than for me to click 'OK' to make all CPUs to start working immediately. This can also be done right in the middle of a mldivide-process I've already initiated, making the process much faster mid-through.
On a side note, I can notice that the CNR branch is displayed as 'unknown' on the i9-10980XE, when I look up the blas version from Matlab command window:
version -blas
ans =
'Intel(R) Math Kernel Library Version 2019.0.3 Product Build 20190125 for Intel(R) 64 architecture applications, CNR branch unknown
'
The workstation should be able to use AVX512, from my understanding. I can force the system to use e.g. AVX2 or AVX512 (or at least it shows up as 'AVX2' or 'AVX512' in Matlab, when calling 'version -blas'), by changing some environmental variables, but it does nothing to help the threading issues.
As a reference, the i9-9900K workstation tells me the following:
version -blas
ans =
'Intel(R) Math Kernel Library Version 2018.0.3 Product Build 20180406 for Intel(R) 64 architecture applications, CNR branch AVX2
'
Now for my questions, which maybe not are so direct... Has anyone experienced something similar and has any input on this matter? It seems this maybe could be related to MKL (?), so if anyone has good knowledge about what happens under the hood, maybe you could help me explain what's really happening here? Any input is helpful! The specific K and f that I use for the tests are too big (K is 1815087-by-1815087) to attach to this message, but could be available in other ways if you are interested.
  1 个评论
ock
ock 2025-1-9
Hi Jonas. Reviving a very old thread here, but did you figure out what was causing the discrepancy between the two processors, and if the inconsistent threading of the 10980XE during mldivide is an issue linked to Intel MKL?

请先登录,再进行评论。

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Parallel and Cloud 的更多信息

产品


版本

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by