matrix multiplication speed calculation

Question

Codefighter 2018-6-8

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/404667-matrix-multiplication-speed-calculation

回答： Razvan Carbunescu 2018-6-8

Hello, I want to calculate matrix multiplication time with Matlab. I have two matrices (P and Q), which are size of [A x B] and [B x 48]. So I wrote the code as following:

--------

tic

R = P*Q;

time_1 = toc

---------

then I calculated the time again by separating the columns of matrix Q. Each separated matrix (Q1,Q2,Q3) has a size of [B x 16]. I wrote the code as following

----------

tic

R1 = P*Q1;

a = toc

tic

R2 = P*Q2;

b = toc

tic

R3 = P*Q3;

b = toc

time_2 = a+b+c;

-----------

I though that the "time_2" will be equal to "time_1" because it has same number of multiplication and addition. However, the results are different. The time_1 is much faster than time_2. I think it's because of the time it takes to load some libraries related to the mathematics. Do you know why this situation happened, and How do I calculate the exact matrix multiplication time ?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Razvan Carbunescu 2018-6-8

3
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/404667-matrix-multiplication-speed-calculation#answer_323819

Matrix multiplication (GEMM) is one of the heavily optimized methods and when operating on larger inputs more optimizations, blocking and cache reuse can be achieved.

The two extremes of this are a BLAS level 2 way where you multiply each column (GEMV - matrix vector multiply) versus the method of BLAS level 3 GEMM (matrix matrix multiply).

A naive GEMM (using 3 for loops) usually gets around 3-5% of the processors peak performance. A blocked GEMM without any other optimization (6 for loops) gets around 20% of the peak performance. The matrix multiply MATLAB uses is Intel MKL's GEMM which is tuned for different processors and can get around 80-90% of the processor's performance.

Now all those numbers above are for large matrix sizes as cache reuse and SIMD need larger sizes to overcome overheads. This is the reason for the differences you are seeing as the larger the matrices get the more optimizations MKL is able to get out of the input.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

matrix multiplication speed calculation

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

matrix multiplication speed calculation

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论