Vectorizing a simple accumulation?

1 次查看(过去 30 天)
I have something like a sparse array, whose first member is always nonzero, and I want to replace each zero element with the nearest non-zero element right before it. For example:
matrixData = [1.3; 0; 0; 0; 4.2; 0; 0; 1.5; 0; 0; 0; 0];
should become
matrixData = [1.3; 1.3; 1.3; 1.3; 4.2; 4.2; 4.2; 1.5; 1.5; 1.5; 1.5; 1.5];
I am currently using a loop:
emptyRows = (matrixData ==0);
for i = 2:length(matrixData)
if emptyRows(i)
matrixData(i) = matrixData(i-1);
end
end
This is the performance bottleneck on my function, and it becomes very slow as I deal with extremely long arrays, and I can't think of a way to speed it up. (Can't parallelize it because the elements are non-independent.) Is there a way to vectorize this using accumarray or anything similar?
Thanks!

采纳的回答

Sean de Wolski
Sean de Wolski 2012-9-20
编辑:Sean de Wolski 2012-9-20
matrixData = [1.3; 0; 0; 0; 4.2; 0; 0; 1.5; 0; 0; 0; 0];
idxk = find(matrixData);
idxr = cumsum(logical(matrixData));
matrixData = matrixData(idxk(idxr));
One of many ways...
  1 个评论
Ephedyn
Ephedyn 2012-9-20
Thanks a lot! This solved my problem and is amazingly powerful. I wish I could accept both yours and Jan's answers for credit, as I had use for both.

请先登录,再进行评论。

更多回答(1 个)

Jan
Jan 2012-9-20
编辑:Jan 2012-9-20
If your vectors are really large, try a Mex function:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
mwSize n, i;
double *X, q, *Y;
n = mxGetNumberOfElements(prhs[0]);
plhs[0] = mxCreateDoubleMatrix(n, 1, mxREAL);
X = mxGetPr(prhs[0]);
Y = mxGetPr(plhs[0]);
q = mxGetNaN();
for (i = 0; i < n; i++) {
if (X[i] != 0.0) {
q = X[i];
}
Y[i] = q;
}
return;
}
The M-version needs some large temporary arrays:
1. t1 = find(matrixData)
2. t2 = logical(matrixData))
3. t3 = cumsum(t2)
4. t4 = idxk(idxr)
Therefore the C-method should have a great advantage.
This function can be parallelized: Use two additional inputs as inital and final index. Skip the inital phase until the 2st non-zero is found instead of inserting NaNs. Proceed after the final index until the next non-zero element as long a the vector length is not exceeded. This should scale very well with the number of cores.
Depending on the processor, this could be faster than the IF method:
int m;
for (i = 0; i < n; i++) {
m = (X[i] == 0);
q = X[i] * m + q * (m - 1);
Y[i] = q;
}
[EDITED] No, avoiding the IF is some percent slower. Some percent faster:
for (i = 0; i < n; i++) {
if (X[i] == 0) {
Y[i] = Y[i - 1];
} else {
Y[i] = X[i];
}
}
  1 个评论
Ephedyn
Ephedyn 2012-9-20
As above, I ended up implementing your solution in the production code though I had to debug in the command window (the actual function is a bit more complicated) using Sean's response. I'll really like to give my deepest gratitude to both of you and wish I could give both credit for answering my question. Thanks aplenty!

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Matrix Indexing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by