Matlab internal parallelization not working (?) under linux

5 次查看(过去 30 天)
Hello,
I have a piece of code which use a lot the QR decomposition function which does support some parallelization internally. However it seems on the computation server at my workplace, this parallelization does not work. Running this code on my laptop with windows, it runs about 3 times faster than on the computation server under linux.
I got to the conclusion that it comes from the parallelization as when looking at the ressource monitor during the computation with windows, all the 4 physical cores in my laptop are about 80% busy while in the server running the same data, there is only 1 core (out of 28 physical core) functionning at 100%.
My question is twofold:
1°) is there a way in matlab to check if the parallelization of internal functions such as qr() is working properly ?
2°) how to make matlab uses this parallelization if it is not working ?
For the second part, my guess is that the BLAS parallelization which is mostly based on OpenMP is not detecting the necessary env vars but I am really not sure of that and/or how to solve it if so.
Thanks for any help !

回答(2 个)

Christine Tobler
Christine Tobler 2018-5-14
You can use the function maxNumCompThreads to see the maximum amount of threads MATLAB uses.
  1 个评论
MAGERAND Ludovic
MAGERAND Ludovic 2018-5-21
On the computation server, I get this:
>> maxNumCompThreads
ans =
28
>> feature('NumCores')
MATLAB detected: 28 physical cores.
MATLAB detected: 56 logical cores.
MATLAB was assigned: 56 logical cores by the OS.
MATLAB is using: 28 logical cores.
MATLAB is not using all logical cores because hyper-threading is enabled.
While on my laptop, I get this:
>> maxNumCompThreads
ans =
4
>> feature('NumCores')
MATLAB detected: 4 physical cores.
MATLAB detected: 8 logical cores.
MATLAB was assigned: 8 logical cores by the OS.
MATLAB is using: 4 logical cores.
MATLAB is not using all logical cores because hyper-threading is enabled.
Yet it still use only one CPU on the computation server. However I am wondering if it is not using only one CPU on my laptop too as using another process manager with more detailled CPU graph load, it seems that although all cores are used, there are not all used at the same time and there is only a 30% on each which combined would not be much more than 100%.
So doing more indepth test, I run the following code on both:
>> A = rand(1000); start_time = tic(); for i = 1:1000; [Q,R] = qr(A); end; toc(start_time)
For the laptop, I get: Elapsed time is 49.009822 seconds. While all the 4 CPU work at ~60%.
For the computation server, I get: Elapsed time is 24.956325 seconds. While ~20 cores are now working at 100% which make more sense.
I guess the QR decompositions I do in my main code are not big enough to actually make the parallelization kick off on any of the two (matrices there are only hundreds of lines by ~10 columns). But then I am still wondering why the laptop is so much faster than the computation server.

请先登录,再进行评论。


Christine Tobler
Christine Tobler 2018-5-30
You can also use maxNumCompThreads to set the maximum number of computational threads you want MATLAB to use. So for example, you could call
maxNumCompThreads(1)
on both machines, and see how their performance compares if only one node is used. Note this is just the maximal number of threads, any specific function may use fewer threads if it is not threaded, or if the matrix is too small so that using all threads may not be worth it.
I'm not sure how easily the QR decomposition can be threaded, so possibly using many cores will not give a very noticeable speedup.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by