When to use MEX in a MATLAB progam (Example of an Ulam Map)
1 次查看(过去 30 天)
显示 更早的评论
I am attempting to learn how to write MEX programs to use in my MATLAB programs when there is need. As far as I know, one does this when one encounters a bottleneck that cannot be suitably vectorized - i.e. when one needs to repat a loop a large number of times, and this loop in unavoidable. However, in the program (See below) that I wrote to test this, I am not finding a significant speed up between MEX and Matlab, and am wondering if a) I have done something wrong b) the example I chose wasn't a good one, or c) MEX does not offer that substantial of a speed up. Any advice in this matter would be greatly appreciated.
This is the MEX program I wrote to iterate the Ulam Map.
#include <math.h>
#include "mex.h"
/*A program to iterate the Ulam Map in MEX
*Call from MATLAB using UlamMap(seed, iter_num */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
#define B_OUT plhs[0]
#define S_IN prhs[0]
#define I_IN prhs[1]
double *B, S;
int I, N, n;
if (nrhs < 1 || nrhs >2)
mexErrMsgTxt("Wrong number of input arguments.");
else if(nlhs > 1)
mexErrMsgTxt("Too many output arguments.");
if(nrhs == 1)
I = 4096;
else
I = mxGetScalar(I_IN);
S = mxGetScalar(S_IN);
B_OUT = mxCreateDoubleMatrix(1, I, mxREAL);
B = mxGetPr(B_OUT);
B[0] = S;
for (n = 1; n<I; n++)
{
B[n] = 1.0-2.0*pow(B[n-1], 2);
}
return;
}
This is the wrapper script which executes this program, as well as a MATLAB version.
format long
x0 = 0.1;
iter =4000096;
tic
vals = zeros(1, iter);
vals(1) = 0.1;
for i = 2:iter
vals(i) = 1-2*vals(i-1)^2;
end
toc
tic
b = UlamMap(x0, iter);
toc
sum((vals-b).^2)
With this large number of iterations used, I see almost no speed up; with a smaller number, it is around a 1.5 speed up. Is this expected? What can I do to maximize the speed up from MEX? Is this an appropriate example?
Thanks,
Danny
0 个评论
采纳的回答
Jan
2012-7-15
编辑:Jan
2012-7-15
POW(X,2) is much slower than X*X, because it is a general algorithm, which has the power to calculate POW(X,2.1) also. Therefore James suggestion is much faster than the original method.
A further improvement:
B[0] = S;
for (n = 1; n<I; n++) {
S = 1.0 - 2.0 * S * S;
B[n] = S;
}
0.044 sec.
3 个评论
Jan
2012-7-15
Even mxMalloc replies zeros - to my surprise. At least I did test this 1e10 times with different array sizes and did not get any junk from formerly used memory. But what is the purpose of mxCalloc?
Anyhow, mxCreateDoubleMatrix has a small overhead for creating the header of the variable, e.g. the dimensions, the type, a lot of flags, etc. But look at James FEX: UNINIT. This creates an uninitialized array and is faster under some circumstances. But let me mention that my investigations of when UNINIT is faster for my programs took hours, while it saves less than a second run-time finally. But if you need to squeeze the last microseconds out of your code, it is worth to try.
James Tursa
2012-7-16
编辑:James Tursa
2012-7-16
@Jan: My experience with mxMalloc & mxCalloc matches yours. They both always returned 0's for data. Even when I allocated a very large memory block with mxMalloc, filled it with non-zero data, deallocated it, then immediately allocated again with mxMalloc & got the same address returned, the data had been wiped clean to 0's again. The only way I have been able to avoid this performance hit is with the undocumented API functions such as used in the UNINIT routine. And even here, as you point out, it is only worth doing if for some reason the overall application requires very many large memory block allocations from scratch. But when you do need this capability it is nice to have.
更多回答(1 个)
James Tursa
2012-7-15
For starters, get rid of the function call overhead:
B[n] = 1.0-2.0*(B[n-1]*B[n-1]);
You can also play games with *B vs B[etc] and can sometimes also get slight speed improvements ... highly dependent on compiler.
1 个评论
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!