Optimize/speed up a big and slow matrix operation with addition and bsxfun?

12 次查看(过去 30 天)
Hi. I have a fairly long line of code in my script that takes about 7 seconds to run, it was originally three separate calculations but are now done in one line that looks like this:
X = bsxfun(@times,reshape(bsxfun(@times,A,B),[1440,1,2251]),(1-C)./2)...
+bsxfun(@times,reshape(E,1440,1,2251),D)...
+bsxfun(@times,reshape(F,1440,1,[]),((1+C)/2));
Since I need to run it several tens of thousands times it is causing a fair amount of anxiety in my life at the moment. I don’t know how I would go about speeding it up further or if it is even possible. So I would really appreciate it if any of you optimization gurus here could give me some advice on how and if I can go about making this faster.
The input variables are of size:
A = 2251x1
B = 1440x2251
C = 1x181
D = 1440x181
E = 1440x2251
F = 1440x2251
Thanks!

采纳的回答

James Tursa
James Tursa 2015-4-20
编辑:James Tursa 2015-4-20
Assuming you have a C compiler available, here is a simple C mex routine to do the calculation. Could possibly be sped up even more with multi-threading (e.g., OpenMP) as well but I did not include that code below. Let me know if you have an OpenMP compilant compiler since that addition would be fairly simple.
As is (with no C-mex multi-threading) here is an example run on my machine:
Elapsed time is 6.655025 seconds. % M-code with bsxfun
Elapsed time is 1.700077 seconds. % C-mex code (not multi-threaded)
CAUTION: Code below is bare bones with no argument checking.
// X = bsxfun(@times,reshape(bsxfun(@times,A,B),[1440,1,2251]),(1-C)./2)...
// + bsxfun(@times,reshape(E,1440,1,2251),D)...
// + bsxfun(@times,reshape(F,1440,1,[]),((1+C)/2));
//
// The input variables are of size:
//
// A = 1 x 2251
// B = 1440 x 2251
// C = 1 x 181
// D = 1440 x 181
// E = 1440 x 2251
// F = 1440 x 2251
//
// Output:
//
// X = 1440 x 181 x 2251
//
// Calling sequence:
//
// X = this_file_name(A,B,C,D,E,F);
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
mwSize i, j, k, m, n, p, km, ikm, jm;
mwSize dims[3];
double *A, *B, *C, *D, *E, *F, *X, *Cm, *Cp;
double a, cp, cm;
m = mxGetM(prhs[3]);
n = mxGetN(prhs[3]);
p = mxGetN(prhs[4]);
A = mxGetPr(prhs[0]);
B = mxGetPr(prhs[1]);
C = mxGetPr(prhs[2]);
D = mxGetPr(prhs[3]);
E = mxGetPr(prhs[4]);
F = mxGetPr(prhs[5]);
dims[0] = m;
dims[1] = n;
dims[2] = p;
plhs[0] = mxCreateNumericArray(3, dims, mxDOUBLE_CLASS, mxREAL);
X = mxGetPr(plhs[0]);
Cm = (double *) mxMalloc(n*sizeof(*Cm));
Cp = (double *) mxMalloc(n*sizeof(*Cp));
for( j=0; j<n; j++ ) {
Cm[j] = (1.0 - C[j]) / 2.0;
Cp[j] = (1.0 + C[j]) / 2.0;
}
for( k=0; k<p; k++ ) {
a = A[k];
km = k * m;
for( j=0; j<n; j++ ) {
jm = j * m;
cm = Cm[j] * a;
cp = Cp[j];
for( i=0; i<m; i++ ) {
ikm = i+km;
*X++ = B[ikm] * cm + E[ikm] * D[i+jm] + F[ikm] * cp;
}
}
}
mxFree(Cp);
mxFree(Cm);
}
  6 个评论
PetterS
PetterS 2015-4-21
Switching to singles brought it down to 1.3 seconds, so not really half but still, since I started with 7 seconds I’m really pleased with 1.3. I will run it continuously for around two weeks now and after that I don’t know if or when I will be using it again so you don’t have to put any effort in argument checking it.
Thank you so much for your help on this, it really saved me a lot of time!
James Tursa
James Tursa 2015-4-23
Another potential speed-up I should mention is to replace mxCreateNumericArray with mxCreateUninitNumericArray.

请先登录,再进行评论。

更多回答(1 个)

John D'Errico
John D'Errico 2015-4-19
You are trying to do billions of multiplies here, resulting in an array that is of size
1440*181*2251
ans =
586700640
So roughly 600 million elements. since they are double precision numbers, the array will require 8 bytes per element. So the final result will require nearly 5 gigabytes of RAM to store.
Oh, by the way, there are three parts to that computation, so your code must generate three such temporary arrays in that computation, that are then summed together. So the internal storage requirements might be as large as 15 gigabytes of RAM, depending on how that code is processed internally.
That it ran in 7 seconds is a testament to the speed of your system, not the slowness of the code.
If you want it to run faster, get more memory. A good idea would be to use a flash based hard drive to allow your CPU to access virtual memory much faster.
  5 个评论
John D'Errico
John D'Errico 2015-4-20
The funny thing is, much computation in the old days was done using single precision. Double precision is more robust though. You can get away with things with doubles that might hurt you with single.
When you start swapping to disk, things suddenly get much slower. So my argument is that IF the time drops by 50%, then you were not doing any serious amount of swapping with the doubles.
Eps tells you about the potential error in those numbers.
eps('double')
ans =
2.22044604925031e-16
eps('single')
ans =
1.192093e-07
Eps is the smallest number you can add to 1, and get a different result. So think of eps as the granularity for that precision, the least significant bit.
Much will depend on what you are doing with those numbers. Is there noise in them anyway, so that trash down around 1 part in 1e7 won't bother you? What are you doing with this result afterwards? On some computations that noise won't matter a bit.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Loops and Conditional Statements 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by