how to search for multiple words anywhere in the sentence ?

5 次查看(过去 30 天)
I want to search for three words "Battery , power , failure" the three must exist in the sentence in any order to copy the cell .
I try :
j=1;
k=1;
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery|power|failure')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:); %save rows which didn't contain
but it search for any cell contains for one of the three.
how i can search for the cells contains the three words in any order?

回答(3 个)

the cyclist
the cyclist 2015-9-19
The most straightforward way, it seems to me, is to do the regexp search three times, once for each word, and then copy the cells where all three match. I am not sure there is a way to do an "and" match in the same way one can do an "or" match like you have done.
  2 个评论
Amr Hashem
Amr Hashem 2015-9-20
thanks to you all...
I take your advice "to do the regexp search three times, once for each word"
and try this:
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:);
%2nd word
D2=data(:,126:130);
idx2 = cellfun('isclass',D2,'char');
idx2(idx2)=~cellfun('isempty',regexpi(D2(idx2),'power')) ;
data2 = data(any(idx2,2),:);
Notdata2 = data(~any(idx2,2),:);
%3rd word
D3=data2(:,126:130);
idx3 = cellfun('isclass',D3,'char');
idx3(idx3)=~cellfun('isempty',regexpi(D3(idx3),'failure')) ;
data3 = data2(any(idx3,2),:);
Notdata3 = data2(~any(idx3,2),:);
NotdataALL=[Notdata;Notdata2;Notdata3];
but I am still thinking, may be the three words not exist in the same cell.
I mean 126= battery 127: power 128= failure
but overall the code now sounds good :)

请先登录,再进行评论。


per isakson
per isakson 2015-9-19
编辑:per isakson 2015-9-20
Try this
sentence_1 = 'abc battery def power ghi failure';
typo_str_1 = 'abc battery def power ghi faiXure';
sentence_2 = 'Battery def power ghi failure.';
typo_str_2 = 'abc Xbattery def power ghi failure';
words = {'battery','power','failure'};
is1 = cellfun( @(str) not(isempty(regexpi( sentence_1, ['\<',str,'\>'] ))), words );
is2 = cellfun( @(str) not(isempty(regexpi( typo_str_1, ['\<',str,'\>'] ))), words );
is3 = cellfun( @(str) not(isempty(regexpi( sentence_2, ['\<',str,'\>'] ))), words );
is4 = cellfun( @(str) not(isempty(regexpi( typo_str_2, ['\<',str,'\>'] ))), words );
&nbsp
A different approach
>> cssm(1)
Elapsed time is 0.001078 seconds.
ans =
1 0 0 1 0 0
>> cssm(1e3);
Elapsed time is 0.791887 seconds.
where
function has_all_three = cssm( N )
sentence_1 = 'Abc battery def power ghi failure.';
typo_str_1 = 'Abc battery def power ghi faiXure.';
multistr_1 = 'Abc battery def power ghi battery.';
sentence_2 = 'Battery def failure ghi power jkl.';
typo_str_2 = 'Abc Xbattery def power ghi failure';
multistr_2 = 'Abc power def power ghi power jkl.';
%
test_sentences = {sentence_1,typo_str_1,multistr_1,sentence_2,typo_str_2,multistr_2};
%
text_corp = repmat( test_sentences, [N,1] );
tic
cac = regexpi( text_corp, ['\<(battery)|(power)|(failure)\>'], 'match' );
has_all_three = cellfun( @(c) length(unique(lower(c)))==3, cac );
toc
end
  12 个评论

请先登录,再进行评论。


Amr Hashem
Amr Hashem 2015-9-20
that's work:
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:);
%2nd word
D2=data(:,126:130);
idx2 = cellfun('isclass',D2,'char');
idx2(idx2)=~cellfun('isempty',regexpi(D2(idx2),'power')) ;
data2 = data(any(idx2,2),:);
Notdata2 = data(~any(idx2,2),:);
%3rd word
D3=data2(:,126:130);
idx3 = cellfun('isclass',D3,'char');
idx3(idx3)=~cellfun('isempty',regexpi(D3(idx3),'failure')) ;
data3 = data2(any(idx3,2),:);
Notdata3 = data2(~any(idx3,2),:);
NotdataALL=[Notdata;Notdata2;Notdata3];
  1 个评论
Cedric
Cedric 2015-9-22
This can be simplified as developed in my answer. I move it below as a comment:
Here is an alternate solution:
keywords = {'battery', 'power', 'failure'} ;
allCells = {'V_batterypowerfailure', 'I_batterypwerfailure'; ...
'V_batterypowerfailure', 'I_atterypowerfailure'; ...
'I_batterypowerfailre', 'V_batterypowerfailure'} ;
ids = 1 : numel( allCells ) ;
for k = 1 : numel( keywords )
isFound = ~cellfun( 'isempty', strfind( allCells(ids), keywords{k} )) ;
ids = ids(isFound) ;
end
validCells = allCells(ids) ;
You'll notice that it works on a pool of cells which reduces with the keyword index (as when a keyword is not found, there is no point in testing the others). I started valid entries of the dummy data set with V_ and invalid entries with I_ to simplify the final check.
If you need a case-insensitive solution, replace
strfind( allCells(ids), keywords{k} )
with
regexpi( allCells(ids), keywords{k}, 'once' )

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Characters and Strings 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by