Find row in matrix, fast and with special conditions
    8 次查看(过去 30 天)
  
       显示 更早的评论
    
I have a matrix (S) and one row out of that matrix (s). I want to get the rownumber of (s) in (S).
(S) has unique rows. The row (s) is allways member of (S)-rows.
So far I calculate the row number with:
I = find(all(bsxfun(@eq,S,s),2));
I am looking for a faster method. I call this line alot, it costs me hours of calculation time.
Here's a test example:
% create example, N and n could change
n = 4;  % n < 10
N = 25; % range(N) ~ [20 100]
S = nchoosek(1:N,n);
for i = 1:10
    s = S(randi(size(S,1)),:);
    tic
    I = find(all(bsxfun(@eq,S,s),2));
    toc
end
[Edit2], poorly expressed "(S) has unique rows". It should express that I is allways scalar.
[Edit], dimensional details.
My example shows the general direction of the sizes.
[d1 d2] = size(S); % its [12650x4] in my example
First dimension will be much larger then the second one. (d1 >> d2)
Dimension two (d2=n) will be smaller then 10. Numel(S) will be limited by Memory.
1 个评论
  Jan
      
      
 2013-5-25
				What is the usual dimension of S? It matters if you are talking about 10x4, 10'000x20 or 20x10'000 matrices.
采纳的回答
  Jan
      
      
 2013-5-25
        
      编辑:Jan
      
      
 2013-5-25
  
      If the first dimension of S is large, the BSXFUN approach checks of a lot of numbers although they are rejected in the former columns already. A small Matlab function can avoid testing already rejected rows:
function Index = FindRow(S, s)
nCol  = size(S, 2);
match = S(:, 1) == s(1);
for col = 2:nCol
   match(match) = S(match, col) == s(col);
end
Index = find(match);
For Image Analyst's example under Matlab 2009a/64/Win7:
S = randi(3, 200, 3);
s = [2 3 1];
tic, for k=1:1000; m = find(all(bsxfun(@eq,S,s),2)); end; toc
tic; for k=1:1000; m = find(ismember(S, s, 'rows')); end; toc
tic; for k=1:1000; m = FindRow(S, s); end; toc
% 0.032688 sec
% 0.433527 sec
% 0.021643 sec
But for S = randi(30, 2000, 20), s=S(1000, :):
% 0.124566 sec
% 3.175940 sec
% 0.194101 sec
[EDITED] It is easier in C to stop the search after the first matching row is found. This saves 50% processing time in the average. In addition no temporary arrays are required:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
  double *S, *s;
  mwSize n1, n2, i1, i2, k;
    S  = mxGetPr(prhs[0]);
    n1 = mxGetM(prhs[0]);
    n2 = mxGetN(prhs[0]);
    s  = mxGetPr(prhs[1]);  
    for (i1 = 0; i1 < n1; i1++) {
       k = i1;
       for (i2 = 0; i2 < n2; i2++) {
          if (S[k] != s[i2]) {
             break;
          }
          k += n1;
       }
       if (i2 == n2) {  // Matching row found:
          plhs[0] = mxCreateDoubleScalar((double) (i1 + 1));
          return;
       }
    }
    // No success:
    plhs[0] = mxCreateDoubleScalar(mxGetNaN());
  }
For S = randi(30, 2000, 20) I get (compiled by MSVC2008):
BSXFUN:      0.123305 sec
FindRow.m:   0.194914 sec
ISMEMBER:    3.172425 sec
FindRow.mex: 0.011098 sec (s is last row)
             0.007880 sec (s is 1000th row)
             0.004753 sec (s is first row)
The MEX has an average speedup of factor 15. For a [20000 x 200] matrix we get a factor of 260 already.
3 个评论
  Jan
      
      
 2013-12-27
				Dear Jacob L,
The request you've sent me by email would have been a perfect question for this forum. Therefore I answer here, how to modify this function to work for UINT8 arrays:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
  uint8_T *M, *v;
  mwSize  m, n, i1, i2, k;
    if (nrhs != 2 || !mxIsUint8(prhs[0]) || !mxIsUint8(prhs[1])) {
       mexErrMsgIdAndTxt("MEX:FindRow:BadInput", 
                         "2 UINT8 arrays required as input.");
    }
    M = (uint8_T *) mxGetData(prhs[0]);
    m = mxGetM(prhs[0]);
    n = mxGetN(prhs[0]);
    v = (uint8_T *) mxGetData(prhs[1]);  
    if (mxGetNumberOfElements(prhs[1]) != n) {
       mexErrMsgIdAndTxt("MEX:FindRow:BadSize", 
                         "Sizes of inputs do not match.");      
    }
    for (i1 = 0; i1 < m; i1++) {
       k = i1;
       for (i2 = 0; i2 < n; i2++) {
          if (M[k] != v[i2]) {
             break;
          }
          k += m;
       }
       if (i2 == n) {  // Matching row found:
          plhs[0] = mxCreateDoubleScalar((double) (i1 + 1));
          return;
       }
    }
    // No success:
    plhs[0] = mxCreateDoubleMatrix(0, 0, mxREAL);
  }
  Mark
 2015-3-13
				Hi,
I used your your original mex file in matlab and tried to change it. I'm also searching for fast ways to find a row matrix s in a large matrix S. However the row matrix s appears a number of times in large matrix S.
if true
  #include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[], mwSize, lastfound)
{
double *S, *s *lastfound;
mwSize n1, n2, i1, i2, k;
  S  = mxGetPr(prhs[0]);
  n1 = mxGetM(prhs[0]);
  // n2 = mxGetN(prhs[0]);
  n2 = 3;
  s  = mxGetPr(prhs[1]);  
  for (i1 = lastfound; i1 < n1; i1++) {
     k = i1;
     for (i2 = 0; i2 < n2; i2++) {
        if (S[k] != s[i2]) {
           break;
        }
        k += n1;
     }
     if (i2 == n2) {  // Matching row found:
        plhs[0] = mxCreateDoubleScalar((double) (i1 + 1));
        return;
     }
  }
  // No success:
  plhs[0] = mxCreateDoubleScalar(mxGetNaN());
}
end
So i tried to use another input 'lastfound' which would enable the code to start at lastfound + 1 (the +1 is added when the function is called), so that the previous row s is found, and the code start searching the second row s. I do this until the mex file return NaN. It however doesn't work, the code doesn't start the at the lastfound + 1 step. I am a C noob, so i think it is a very small error, but i would immensely appreciate it if somebody could take a look at it.
Thanks in advance :)
更多回答(2 个)
  Image Analyst
      
      
 2013-5-25
        Try this:
% Generate some sample data:
m = randi(3, 200, 3) 
% Specify what we're looking for:
lookingFor = [2 3 1]; % Let's say we're looking for this
% Find out what rows that pattern occurs in:
rowsItOccursIn = find(ismember(m, lookingFor, 'rows'))
5 个评论
  Image Analyst
      
      
 2013-5-25
				What are the actual sizes of n, N, and S? For 4 and 25, it DEFINITELY should not take "hours of calculation time" like you said.
  Teja Muppirala
    
 2013-5-26
        Not sure if this is faster, but might be worth a try.
d1 = 12650;
d2 = 4;
S = randn(d1,d2);
s = S(randi(d1),:);
c = 1;
r = 1;
while c <= d2
    if S(r,c) == s(c)
        c = c+1;
    else
        r = r+1;
        c = 1;
    end
end
isequal(S(r,:),s)
If the big S is the same every time, you could make a better algorithm by sorting it in advance.
另请参阅
类别
				在 Help Center 和 File Exchange 中查找有关 Matrix Indexing 的更多信息
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!





