How to speed up MEX function?
21 次查看(过去 30 天)
显示 更早的评论
following mex code is running too slow, but I don't know why it is and how to make it faster. Any help is greatly appreciated!
calculate_my_way.cpp
#include "mex.hpp"
#include "mexAdapter.hpp"
#include <cmath>
class MexFunction : public matlab::mex::Function {
public:
void operator()(matlab::mex::ArgumentList outputs, matlab::mex::ArgumentList inputs) {
matlab::data::TypedArray<double> var0 = inputs[0];
matlab::data::TypedArray<double> var1 = inputs[1];
matlab::data::TypedArray<double> var2 = inputs[2];
matlab::data::TypedArray<double> var3 = inputs[3];
auto var0Iter = var0.begin();
auto var1Iter = var1.begin();
auto var2Iter = var2.begin();
auto var3Iter = var3.begin();
const int numOfElements = var0.getNumberOfElements();
double buffer = 0;
for (int x = 0; x<numOfElements; x++)
{
buffer = std::sin(*var0Iter) + std::sin(*var1Iter) + std::sin(*var2Iter) + std::cos(*var3Iter);
*var0Iter = buffer;
buffer = std::sin(*var1Iter + *var2Iter) + std::cos(*var3Iter);
*var1Iter = buffer;
var0Iter++;
var1Iter++;
var2Iter++;
var3Iter++;
}
outputs[0] = std::move(var0);
outputs[1] = std::move(var1);
}
};
It's just simple calculation, but this code runs even slower than native distance function which performs a lot more complicated calculation than just a few sin+cos.
I'm using compiler that came with Visual Studio 2017. below is how I run mex and the compiler setup info.
mex -v calculate_my_way.cpp
...
Compiler location: C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\
...
OPTIMFLAGS : /O2 /Oy- /DNDEBUG
and this is how I am seeing performance issues.
clear
size_test = 1e7;
var1 = zeros(size_test, 1);
var2 = zeros(size_test, 1);
var3 = zeros(size_test, 1);
var4 = zeros(size_test, 1);
cant_beat_me = @() distance(var1,var2,var3,var4);
elapsed_time = timeit(cant_beat_me);
mex_slow = @() calculate_my_way(var1,var2,var3,var4);
elapsed_time = timeit(mex_slow);
15 个评论
Bruno Luong
2022-11-3
By curiosity I code the same calculation in C. Time is 0.24 sec; twice faster than C++ (0.5 sec) but 60% slower than MATLAB (0.147 sec).
/* mex -g -R2018a calculate_C_way.c */
#include "mex.h"
#include <math.h>
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
int i, n;
double *var0Iter, *var1Iter, *var2Iter, *var3Iter, *out0Iter, *out1Iter;
n = mxGetNumberOfElements(prhs[0]);
plhs[0] = mxCreateNumericMatrix(1, n, mxDOUBLE_CLASS, mxREAL);
plhs[1] = mxCreateNumericMatrix(1, n, mxDOUBLE_CLASS, mxREAL);
var0Iter = mxGetDoubles(prhs[0]);
var1Iter = mxGetDoubles(prhs[1]);
var2Iter = mxGetDoubles(prhs[2]);
var3Iter = mxGetDoubles(prhs[3]);
out0Iter = mxGetDoubles(plhs[0]);
out1Iter = mxGetDoubles(plhs[1]);
for (i = 0; i < n; i++) {
*out0Iter = sin(*var0Iter) + sin(*var1Iter) + sin(*var2Iter) + cos(*var3Iter);
*out1Iter = sin(*var1Iter + *var2Iter) + cos(*var3Iter);
out0Iter++;
out1Iter++;
var0Iter++;
var1Iter++;
var2Iter++;
var3Iter++;
}
}
采纳的回答
Bruno Luong
2022-11-3
编辑:Bruno Luong
2022-11-3
Last experience, Time with C OpenMP, Intel Parallel Studio XE 2022
CIntel_elapsed_time = 0.0574 [sec]
2.5 faster than MATLAB (finally I beat MATLAB).
To have fast mex: Use C-API (not Cpp), Make it multi-thread, Select a decent compiler.
/* Compile with intel compiler
mex -O COMPFLAGS="$COMPFLAGS /MD /Qopenmp" -R2018a calculate_C_way.c */
#include "mex.h"
#include <math.h>
/* Set to 1 to Enable OPENMP
to 0 to disable it */
#define OPENMP_FLAG 1
#if OPENMP_FLAG == 1
#include <omp.h>
#endif
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
int i, n;
double *var0Iter, *var1Iter, *var2Iter, *var3Iter, *out0Iter, *out1Iter;
n = mxGetNumberOfElements(prhs[0]);
plhs[0] = mxCreateNumericMatrix(1, n, mxDOUBLE_CLASS, mxREAL);
plhs[1] = mxCreateNumericMatrix(1, n, mxDOUBLE_CLASS, mxREAL);
var0Iter = mxGetDoubles(prhs[0]);
var1Iter = mxGetDoubles(prhs[1]);
var2Iter = mxGetDoubles(prhs[2]);
var3Iter = mxGetDoubles(prhs[3]);
out0Iter = mxGetDoubles(plhs[0]);
out1Iter = mxGetDoubles(plhs[1]);
#if OPENMP_FLAG==1
#pragma omp parallel for default(none) private(i) \
schedule(static) \
shared(n, out0Iter, out1Iter, var0Iter, var1Iter, var2Iter, var3Iter)
#endif
for (i = 0; i < n; i++) {
out0Iter[i] = sin(var0Iter[i]) + sin(var1Iter[i]) + sin(var2Iter[i]) + cos(var3Iter[i]);
out1Iter[i] = sin(var1Iter[i] + var2Iter[i]) + cos(var3Iter[i]);
}
}
2 个评论
James Tursa
2022-11-7
Typically, instead of this
#define OPENMP_FLAG 1
#if OPENMP_FLAG == 1
#include <omp.h>
#endif
you can use this:
#ifdef _OPENMP
#include <omp.h>
#endif
The _OPENMP macro is defined by the compiling environment when OpenMP is available.
更多回答(1 个)
Bruno Luong
2022-11-2
编辑:Bruno Luong
2022-11-2
I don't know well C++, but I have practiced quite a lot mex C.
It looks like this statement just move a bunch of data
outputs[0] = std::move(var0);
outputs[1] = std::move(var1);
ALso I wonder if your input "0, and 1 would change
*var0Iter = buffer;
...
*var1Iter = buffer;
after calling the mex, which is NOT allowed.
2 个评论
Bruno Luong
2022-11-2
" Another one of your answer here helped me tremendously a few years back! thank you! "
Oh... realy glad to read that...
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!