MATLAB Coder: Matrix-Scalar-Multiplication slower in generated code?

Question

0 个投票

We are generating DLLs and MEX-Files from MATLAB-Code and realized that Matrix*Scalar-Operations take 3-10 times longer in generated code in comparison to the original MATLAB code. Can anyone explain this slowdown?

I wrote two toy-functions for Matrix*Scalar and Matrix*Vector. For the latter, execution time was the same in the orginial and the generated code. The matrix size was [1000x1000] for both cases.

Interestingly, the MEX calls BLAS-library for Matrix*Vector but not for Matrix*Scalar. May this be a reason?

Toy function for Matrix*Scalar:

function [MatrixOut] = MatrixScalar_Function(MatrixIn,ScalarIn)
MatrixOut = MatrixIn;
for index = 1:1000
  MatrixOut = MatrixOut*ScalarIn;
end
end

Generated C-Code for Matrix*Scalar:

/*
* MatrixScalar_Function.cpp
*
* Code generation for function 'MatrixScalar_Function'
*
*/
/* Include files */
#include "rt_nonfinite.h"
#include "MatrixScalar_Function.h"
#include "MatrixScalar_Function_data.h"
/* Function Definitions */
void MatrixScalar_Function(const emlrtStack *sp, const real_T MatrixIn[1000000],
real_T ScalarIn, real_T MatrixOut[1000000])
{
int32_T b_index;
int32_T i0;
memcpy(&MatrixOut[0], &MatrixIn[0], 1000000U * sizeof(real_T));
b_index = 0;
while (b_index < 1000) {
for (i0 = 0; i0 < 1000000; i0++) {
MatrixOut[i0] *= ScalarIn;
}
b_index++;
if (*emlrtBreakCheckR2012bFlagVar != 0) {
emlrtBreakCheckR2012b(sp);
}
}
}
/* End of code generation (MatrixScalar_Function.cpp) */

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

James Tursa 2018-10-11

编辑：James Tursa 2018-10-11

在 MATLAB Online 中打开

0 个投票

I'm only seeing about a 5% difference in timing when comparing the BLAS dscal function call to an explicit loop in my R2017b Win64. Certainly not the 3-10 times difference that you are seeing. You might try replacing that loop with a dscal call and see what you get in your case. E.g., replace this

for (i0 = 0; i0 < 1000000; i0++) {
MatrixOut[i0] *= ScalarIn;
}

with something like this

#include "blas.h"
    :
int64_T n, incx;  <-- or maybe int32_T in your case
    :
incx = 1;
n = 1000000;
dscal( &n, &ScalarIn, MatrixOut, &incx );

But I do see a big difference in timing when compared to the m-code. My guess is that perhaps the BLAS dscal routine is not multi-threaded and that is why the timing is nearly the same as a manual loop, but MATLAB uses a multi-threaded scalar multiply routine in the background for the m-code.

3 个评论
显示 1更早的评论隐藏 1更早的评论

Ryan Livingston 2018-10-23

编辑：Ryan Livingston 2018-10-23

MATLAB Coder supports generating OpenMP code for parfor loops:

https://www.mathworks.com/help/coder/optimize-speed-of-generated-code.html

When I change your for to a parfor and regenerate a MEX file, I see s performance improvement. Generally the MATLAB execution is still faster but the numbers are closer.

When generating standalone code (e.g. a DLL) make sure you set the build configuration to Faster Runs to enable C/C++ compiler optimizations:

https://www.mathworks.com/help/coder/ref/coder.codeconfig.html?searchHighlight=codeconfig&s_tid=doc_srchtitle

More info on optimizing for performance with MATLAB Coder is available in the documentation:

https://www.mathworks.com/help/coder/optimize-speed-of-generated-code.html

Stefan 2018-10-29

Thank you very much for your suggestions, Ryan!

I will check out if our target hardware supports OpenMP.

I had set "Faster Runs" already.

To keep you updated, this is an answer I got from Mathworks Support which goes to the same direction like James Tursa's answer:

"First of all, nothing about MATLAB's scalar times Matrix multiplication is "interpreted". The only role the MATLAB interpreter plays is dispatching to the appropriate compiled and optimized routine to carry out the work. The principal reason MATLAB is faster is because it is multithreaded. Setting maxNumCompThreads(1) brings the execution time within 10% or so using dynamic arrays. The BLAS DSCAL function is not multi-threaded. It does not improve performance much here."

请先登录，再进行评论。

MATLAB Coder: Matrix-Scalar-Multiplication slower in generated code?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

3 个评论
显示 1更早的评论隐藏 1更早的评论

更多回答（0 个）

类别

产品

标签

Community Treasure Hunt

MATLAB Coder: Matrix-Sca​lar-Multip​lication slower in generated code?

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

3 个评论 显示 1更早的评论 隐藏 1更早的评论

更多回答（0 个）

类别

产品

标签

另请参阅

Community Treasure Hunt

MATLAB Coder: Matrix-Scalar-Multiplication slower in generated code?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论