How to search a substring in a list of strings?

60 次查看(过去 30 天)
I have {'xx', 'abc1', 'abc2', 'yy', 'abc100'} and I would like to search 'abc' and get back {'abc1', 'abc2', 'abc100'}. Is it possible to do this in a simple way without a for cycle?

回答(4 个)

Jos (10584)
Jos (10584) 2018-1-29
In recent releases you can use startsWith
A = {'xx', 'abc1', 'abc2', 'yy', 'abc100'}
tf = startsWith(A,'abc')
B = A(tf)
See the documentation on string functions for many other utilities that may be useful for you.

Stephen23
Stephen23 2018-1-29
Much faster than using cellfun or any string functions:
>> C = {'xx', 'abc1', 'abc2', 'yy', 'abc100'};
>> Z = C(strncmp(C,'abc',3));
>> Z{:}
ans = abc1
ans = abc2
ans = abc100
  1 个评论
Jan
Jan 2018-1-29
编辑:Jan 2018-1-29
+1. 2 minutes faster than me. See my answers for timings: for cell strings, strcnmp is 65 times faster than startsWith.

请先登录,再进行评论。


Jan
Jan 2018-1-29
编辑:Jan 2018-1-29
If you want to search at the start of the strings only, this is efficient:
A = {'xx', 'abc1', 'abc2', 'yy', 'abc100'};
B = s(strncmp(s, 'abc', 3));
Some timings:
% Some larger test data:
A = repmat({'xx', 'abc1', 'abc2', 'yy', 'abc100'}, 1, 1000);
S = string(A);
tic;
for k = 1:1000
tf = startsWith(A, 'abc');
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = startsWith(S, 'abc');
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = strncmp(s, 'abc', 3);
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = cellfun(@any,strfind(A, 'abc'));
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = ~cellfun('isempty', strfind(A, 'abc'));
B = A(tf);
end
toc
Elapsed time is 1.492006 seconds. % startsWith(cell string)
Elapsed time is 0.308345 seconds. % startsWith(string)
Elapsed time is 0.018157 seconds. % strncmp
Elapsed time is 8.095714 seconds. % cellfun(@any, strfind)
Elapsed time is 1.706694 seconds. % cellfun('isempty', strfind)
Note that cellfun method searches for the substring anywhere in the strings, while the two other methods search at the start only. With modern string methods this would be:
tf = contains(A, 'abc');
This has an equivalent speed as startsWith.
@MathWorks: strncmp is 17 times faster for cell strings than startsWith for strings. The conversion from cell strings to strings inside startsWith let it run 65 times slower than strncmp. There is a great potential for improvements.

Fangjun Jiang
Fangjun Jiang 2018-1-29
s={'xx', 'abc1', 'abc2', 'yy', 'abc100'};
index=cellfun(@any,strfind(s,'abc'));
s(index)
  2 个评论
Matt J
Matt J 2018-1-29
编辑:Matt J 2018-1-29
This solution has a for-loop embedded within cellfun.
Jan
Jan 2018-1-29
编辑:Jan 2018-1-29
@Matt J: But cellfun is at least a fast C-mex function. Every function to process a cell string must contain a loop anywhere. The problem is using cellfun with an expensive anonymous function. About 4 times faster (but still slower than startsWith, timings see my answer):
index = ~cellfun('isempty', strfind(s, 'abc'));

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Data Type Conversion 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by