Matlab. Find the indices of a cell array of strings with characters all contained in a given string (without repetition)

4 次查看(过去 30 天)
I have one string and a cell array of strings.
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz'};
I want to obtain:
idx = [2, 3, 6];
I have written a very long code that:
1. finds the elements with length not greater than length(str);
2. removes the elements with characters not included in str;
3. finally, for each remaining element, checks the characters one by one
Essentially, it's an almost brute force code and runs very slowly. I wonder if there is a simple way to do it fast.

采纳的回答

Mohsen Nosratinia
Mohsen Nosratinia 2013-10-13
You can sort the strings and then match them using regular expression. For your example the pattern will be ^a{0,2}c{0,1}t{0,1}z{0,1}$:
u = unique(str);
t = ['^' sprintf('%c{0,%d}', [u; histc(str,u)]) '$'];
s = cellfun(@sort, dic, 'uni', 0);
idx = find(~cellfun('isempty', regexp(s, t)));

更多回答(3 个)

Cedric
Cedric 2013-10-13
编辑:Cedric 2013-10-13
Here is another solution, for the fun of it ..
>> spectrum = @(s) accumarray(s.'-64, ones(size(s)), [58,1]) ;
>> str_spec = spectrum(str) ;
Then
>> find(cellfun(@(s)all(spectrum(s)<=str_spec), dic))
ans =
2 3 6
>> dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aaz', 'aaaz'} ;
>> find(cellfun(@(s)all(spectrum(s)<=str_spec), dic))
ans =
2 3 6 8
where we see that 'aaz' was taken into account but not 'aaaz' as is has more a's than str.
  2 个评论
N/A
N/A 2013-10-13
Cedric, when I try your code it returns only 2.
When I use my real cell of strings, it returns this error:
??? Error using ==> accumarray
First input SUBS must contain positive integer subscripts.
Error in ==> @(s)accumarray(s.'-64,ones(size(s)),[58,1])
Error in ==> @(s)all(spectrum(s)<=str_spec)
I am trying to understand why this is happening.
Cedric
Cedric 2013-10-13
编辑:Cedric 2013-10-13
A copy/paste with the following returns only 2?
str = 'actaz' ;
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz'} ;
spectrum = @(s) accumarray(s.'-64, ones(size(s)), [58,1]) ;
str_spec = spectrum(str) ;
find(cellfun(@(s)all(spectrum(s)<=str_spec), dic))
It shouldn't.
In your real cell of strings, do you have special characters, numbers or spaces? I wrote this solution thinking that there would be only letters (lower or upper case). If there are spaces and digits, please use the update version of spectrum:
spectrum = @(s) accumarray(s.'-31, ones(size(s)), [91,1]) ;
If there can be any special character, just use
spectrum = @(s) accumarray(s.'-0, ones(size(s)), [256,1]) ;
PS: s.'-0 could be replaced by double(s).', but I wanted to keep the previous structure for it not to be even more confusing ;-)

请先登录,再进行评论。


Jos (10584)
Jos (10584) 2013-10-13
Take a look at my function MATCHROW:
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aaz'};
indices = find(cellfun(@(x) matchrow(str,x),dic))
The warning it issues (when numel(str) > numel(dic{k}) ), can be ignored. The function can be found here: http://www.mathworks.com/matlabcentral/fileexchange/14520

Azzi Abdelmalek
Azzi Abdelmalek 2013-10-13
编辑:Azzi Abdelmalek 2013-10-13
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz'};
idx1=find(cellfun(@numel,dic)<=numel(str))
dic1=dic(idx1)
idx2=cellfun(@(x) numel(unique(x))==numel(x),dic1)
idx3=idx1(idx2)
dic3=dic1(idx2)
idx4= cellfun(@(x) all(ismember(x,str)),dic3)
indices=idx3(idx4)
  2 个评论
N/A
N/A 2013-10-13
Dear Azzi,
your solution is amazing and very elegant, although apparently it is also a bit slow (but faster than the code that I am currently using).
There is only one problem. Suppose that
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aaz'};
the solution should be
indices = [2 3 6 8];
because 'a' appears twice in str. I realize I should have made clear that characters can appear at most n times in dic if they appear n times in str - i.e characters cannot be repeated more times than they are repeated in str.
Do you think your code can be adjusted to this condition?
Anyway, thank you a lot for your prompt answer!

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Logical 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by