Faster alternative to find or logical indexing for specific problem

5 次查看(过去 30 天)
I have a massively iterative algorithm in which this code sits so I am profiling it whenever I add something to the algorithm and currently the find or indexing in this piece of code is taking quite a long time so I am wondering if I am missing some simpler approach given the simple nature of my initial array.
I have a predefined array of contiguous indices from 1, I also have an array of indices from these to ignore (note, mine are not random, they are predefined, but change each iteration) and an array of evaluation values corresponding to each of the original indices
I then choose a random 10% sample out of those indices that are not on the ignore list, find the one giving the best evaluation and then wish to return the original index corresponding to this value. So at the moment I do similar to this (albeit with predefined originalIndices, evaluations, indicesToIgnore - they are random here for testing purposes) :
function chosenLinearIdx = chooseSample( )
originalIndices = 1:300000;
evaluations = rand( size( originalIndices ) );
rng( 77 ) % Just in test code for repeatability
indicesToIgnore = randperm( 300000, 100 );
ignoreLogicalIdx = ismember( originalIndices, indicesToIgnore );
validIdx = find( ~ignoreLogicalIdx ); % Slow!
% validIdx = originalIndices( ~ignoreLogicalIdx ); % Alternative approach, but twice as slow as find
n = numel( validIdx );
k = round( 0.1 * n );
subIndices = ceil( rand( k, 1 ) * n );
[~, bestIdx] = max( evaluations( validIdx( subIndices ) ) );
chosenLinearIdx = validIdx( subIndices( bestIdx ) );
end
It is the calculation of those validIdx in the middle that is taking a long time, but I am not seeing a way to map back to my original indices, having chosen an index from the smaller list, without doing this.
It feels as though there should be some quicker method, but either there isn't or I am just missing it. Obviously there are options of using mex or some other C++ type of optimisation if I really want, but I want to make sure I have the best Matlab implementation before I consider those.
Note, my originalIndices are defined just once and stored in a class so they are not being reallocated every time this function is called, which can be of the order of a million times.
Also my original version of this code was using setdiff, but that was far slower so that is not an option I plan to explore again. I have also found cumsum to glow red on the profiler whenever I use that in this massively iterative code so I didn't bother trying a solution using that, which may still need a find anyway.
  2 个评论
James Tursa
James Tursa 2017-7-25
Seems like you should be using randperm for your subIndices generation to avoid potential repeating indexes (if you care about that).
Adam
Adam 2017-7-25
I was doing originally, but it was considerably slower and I didn't really care about repetitions since I am taking a sample anyway so speed was more important there.

请先登录,再进行评论。

采纳的回答

James Tursa
James Tursa 2017-7-25
编辑:James Tursa 2017-7-25
In case you go the mex route, here is a routine for you for the validIdx calculation:
/* validIdx_mex.c
*
* B = validIdx_mex(n,A)
*
* Creates a vector of indexes 1:n but excluding those indexes in A.
*
* Programmer: James Tursa
*/
/* Includes ----------------------------------------------------------- */
#include "mex.h"
/* Gateway ------------------------------------------------------------ */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
mwSize i, j, n, m;
double *Bpr, *Apr, *bpr;
/* Argument checks */
if( nrhs != 2 || !mxIsNumeric(prhs[0]) || !mxIsDouble(prhs[1]) ||
mxGetNumberOfElements(prhs[0]) != 1 || mxIsSparse(prhs[1]) ||
mxIsComplex(prhs[0]) || mxIsComplex(prhs[1]) ) {
mexErrMsgTxt("Invalid inputs");
}
n = mxGetScalar(prhs[0]);
if( n <= 0 ) {
mexErrMsgTxt("Invalid 1st argument");
}
m = mxGetNumberOfElements(prhs[1]);
Apr = mxGetPr(prhs[1]);
/* Create the output */
plhs[0] = mxCreateDoubleMatrix( 1, n, mxREAL );
Bpr = mxGetPr(plhs[0]);
/* Fill the output with default indexing */
bpr = Bpr;
for( i=1; i<=n; i++ ) {
*bpr++ = i;
}
/* Put 0's in the ignore spots */
for( i=0; i<m; i++ ) {
j = *Apr;
if( j != *Apr || j < 1 ) {
mexErrMsgTxt("Invalid 2nd argument value");
} else if( j <= n ) {
Bpr[j-1] = 0;
}
Apr++;
}
/* Shift all the non-zero indexes to the front */
bpr = Bpr;
j = 0;
for( i=0; i<n; i++ ) {
if( *bpr ) {
*Bpr++ = *bpr;
j++;
}
bpr++;
}
/* Set the new size based on the number of actual non-zeros */
mxSetN(plhs[0],j);
}
  3 个评论
Adam
Adam 2017-7-26
Looks to be running ~3½ times faster using this so it is definitely a big improvement. I may still need to reconsider my algorithm though as absolute speed is still questionable, but it is workable for now, thanks.
James Tursa
James Tursa 2017-7-28
It may not make much difference for the sizes you are using, but those first couple of loops could be parallelized. E.g., if you had an OpenMP compliant compiler.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Creating and Concatenating Matrices 的更多信息

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by