Faster alternative to find or logical indexing for specific problem

Question

Adam 2017-7-25

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/350212-faster-alternative-to-find-or-logical-indexing-for-specific-problem

评论： James Tursa 2017-7-28

I have a massively iterative algorithm in which this code sits so I am profiling it whenever I add something to the algorithm and currently the find or indexing in this piece of code is taking quite a long time so I am wondering if I am missing some simpler approach given the simple nature of my initial array.

I have a predefined array of contiguous indices from 1, I also have an array of indices from these to ignore (note, mine are not random, they are predefined, but change each iteration) and an array of evaluation values corresponding to each of the original indices

I then choose a random 10% sample out of those indices that are not on the ignore list, find the one giving the best evaluation and then wish to return the original index corresponding to this value. So at the moment I do similar to this (albeit with predefined originalIndices, evaluations, indicesToIgnore - they are random here for testing purposes) :

function chosenLinearIdx = chooseSample( )
   originalIndices = 1:300000;
   evaluations = rand( size( originalIndices ) );
   rng( 77 ) % Just in test code for repeatability
   indicesToIgnore = randperm( 300000, 100 );
   ignoreLogicalIdx = ismember( originalIndices, indicesToIgnore );
   validIdx = find( ~ignoreLogicalIdx ); % Slow!
   % validIdx = originalIndices( ~ignoreLogicalIdx ); % Alternative approach, but twice as slow as find
   n = numel( validIdx );
   k = round( 0.1 * n );
   subIndices = ceil( rand( k, 1 ) * n );
   [~, bestIdx] = max( evaluations( validIdx( subIndices ) ) );
   chosenLinearIdx = validIdx( subIndices( bestIdx ) );
end

It is the calculation of those validIdx in the middle that is taking a long time, but I am not seeing a way to map back to my original indices, having chosen an index from the smaller list, without doing this.

It feels as though there should be some quicker method, but either there isn't or I am just missing it. Obviously there are options of using mex or some other C++ type of optimisation if I really want, but I want to make sure I have the best Matlab implementation before I consider those.

Note, my originalIndices are defined just once and stored in a class so they are not being reallocated every time this function is called, which can be of the order of a million times.

Also my original version of this code was using setdiff, but that was far slower so that is not an option I plan to explore again. I have also found cumsum to glow red on the profiler whenever I use that in this massively iterative code so I didn't bother trying a solution using that, which may still need a find anyway.

2 个评论
显示无隐藏无

James Tursa 2017-7-25

Seems like you should be using randperm for your subIndices generation to avoid potential repeating indexes (if you care about that).

Adam 2017-7-25

I was doing originally, but it was considerably slower and I didn't really care about repetitions since I am taking a sample anyway so speed was more important there.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

James Tursa 2017-7-25

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/350212-faster-alternative-to-find-or-logical-indexing-for-specific-problem#answer_275496

编辑：James Tursa 2017-7-25

在 MATLAB Online 中打开

In case you go the mex route, here is a routine for you for the validIdx calculation:

/* validIdx_mex.c
 *
 * B = validIdx_mex(n,A)
 *
 * Creates a vector of indexes 1:n but excluding those indexes in A.
 *
 * Programmer: James Tursa
 */
/* Includes ----------------------------------------------------------- */
#include "mex.h"
/* Gateway ------------------------------------------------------------ */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    mwSize i, j, n, m;
    double *Bpr, *Apr, *bpr;
/* Argument checks */
    if( nrhs != 2 || !mxIsNumeric(prhs[0]) || !mxIsDouble(prhs[1]) ||
        mxGetNumberOfElements(prhs[0]) != 1 || mxIsSparse(prhs[1]) ||
        mxIsComplex(prhs[0]) || mxIsComplex(prhs[1]) ) {
        mexErrMsgTxt("Invalid inputs");
    }
    n = mxGetScalar(prhs[0]);
    if( n <= 0 ) {
        mexErrMsgTxt("Invalid 1st argument");
    }
    m = mxGetNumberOfElements(prhs[1]);
    Apr = mxGetPr(prhs[1]);
/* Create the output */
    plhs[0] = mxCreateDoubleMatrix( 1, n, mxREAL );
    Bpr = mxGetPr(plhs[0]);
/* Fill the output with default indexing */
    bpr = Bpr;
    for( i=1; i<=n; i++ ) {
        *bpr++ = i;
    }
/* Put 0's in the ignore spots */
    for( i=0; i<m; i++ ) {
        j = *Apr;
        if( j != *Apr || j < 1 ) {
            mexErrMsgTxt("Invalid 2nd argument value");
        } else if( j <= n ) {
            Bpr[j-1] = 0;
        }
        Apr++;
    }
/* Shift all the non-zero indexes to the front */
    bpr = Bpr;
    j = 0;
    for( i=0; i<n; i++ ) {
        if( *bpr ) {
            *Bpr++ = *bpr;
            j++;
        }
        bpr++;
    }
/* Set the new size based on the number of actual non-zeros */
    mxSetN(plhs[0],j);
}

3 个评论
显示 1更早的评论隐藏 1更早的评论

Adam 2017-7-26

Looks to be running ~3½ times faster using this so it is definitely a big improvement. I may still need to reconsider my algorithm though as absolute speed is still questionable, but it is workable for now, thanks.

James Tursa 2017-7-28

It may not make much difference for the sizes you are using, but those first couple of loops could be parallelized. E.g., if you had an OpenMP compliant compiler.

请先登录，再进行评论。

Faster alternative to find or logical indexing for specific problem

2 个评论
显示无隐藏无

采纳的回答

3 个评论
显示 1更早的评论隐藏 1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

Faster alternative to find or logical indexing for specific problem

2 个评论 显示 无隐藏 无

采纳的回答

3 个评论 显示 1更早的评论隐藏 1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

2 个评论
显示无隐藏无

3 个评论
显示 1更早的评论隐藏 1更早的评论