Optimization code to avoid the repeating of the same expression

Hi! There is a method to compact this expressions?
matchstarts_00(k) = regexp(TrajCompact(k,1), '0.+?0'); %for '00'
matchcounts_00(k) = cellfun(@numel, matchstarts_00(k));
matchstarts_11(k) = regexp(TrajCompact(k,1), '1.+?1'); %for '11'
matchcounts_11(k) = cellfun(@numel, matchstarts_11(k));
matchstarts_22(k) = regexp(TrajCompact(k,1), '2.+?2'); %for '11'
matchcounts_22(k) = cellfun(@numel, matchstarts_22(k));
matchstarts_33(k) = regexp(TrajCompact(k,1), '3.+?3'); %for '11'
matchcounts_33(k) = cellfun(@numel, matchstarts_33(k));
matchstarts_44(k) = regexp(TrajCompact(k,1), '4.+?4'); %for '11'
matchcounts_44(k) = cellfun(@numel, matchstarts_44(k));
I have a great number of this expressions: i think there is a smart method to represent them but I don't find it. Can you help me? Thanks!

回答(2 个)

This is easy if we don't use numbered variables (see link below to know why numbered variables usually occur in bad code). The idea is simple: use a cell array and a new loop to iterate over each of 0, 1, etc. Make sure that the cell arrays are preallocated before the loops!
V = 0:4;
C_start = cell(numel(V),maxK);
C_count = cell(numel(V),maxK);
for k = 1:maxK
for n = 1:numel(V)
rgx = sprintf('%d.+?%d',V(k),V(k));
C_start{n,k} = regexp(TrajCompact(k,1),rgx);
C_count{n,k} = cellfun(@numel, C_start{n,k});
end
end
Note that this code is untested, because you did not give any data for us to test code on. Depending on the values that are returned and saved there may be simpler structures.
You should avoid numbered variables, because if it has an index then actually make it an index rather than just part of the variable name. This is the point of the cell array. And do not be tempted to dynamically define those variable names, here are explanations why this is a bad idea:

12 个评论

Thanks for your answer! It's very helpful for my work... I don't understand what's 'C' and 'maxK'... C=trajCompact? maxK=k?
I made a mistake with variable names: C has now been changed to V.
Your code references some variable k, which implies that this is in a loop: matchstarts_00(k). The variable maxK is simply the maximum value of k. Note I had to guess all of this, because you did not give us a complete explanation of your code: what is k and what values does TrajCompact have?
I have a array with 106 struct. Of this 106 struct, I need only a value (that is a string) called semanticTraj(1,k).semanticTraj with k=106. I keep this 106 strings and put them in TrajCompact: I made this to analyse them easily. Of these string I have to find all the values '00','11' etc and I have to count the number of time that they appear in all strings. '00','11' etc don't be consecutive so the expression '0.+?0' it's right. So TrajCompact is 106x1, k is 106.
These are examples of strings in TrajCompact
'2132132(43)1(43)2324234'
'0(23)1(32)1321(32)3(14)(34)30431(34)1431(43)134(31)(32)3'
'0(43)(23)12432(43)41(23)241241(23)(43)13(21)4212342(43)2(43)1(34)31323'
'(32)31(23)21(23)21321(23)21(23)21(23)12321(23)1(23)21(23)21(32)3'
I create a code to make this but it's not optimizate: I post yesterday part of it, I want to find a method to optimized it. I hope that now the problem is clear.
I am sure that there are more efficient ways to program this, but is difficult to write code for imaginary data. Please upload your complete code, and enough data so that we can run the code.
ok, I attach TrajCompact.mat
My code is very inefficient:
for k=1:106
matchstarts_00(k) = regexp(TrajCompact(k,1), '0.+?0'); %for '00'
matchcounts_00(k) = cellfun(@numel, matchstarts_00(k));
matchstarts_11(k) = regexp(TrajCompact(k,1), '1.+?1'); %for '11'
matchcounts_11(k) = cellfun(@numel, matchstarts_11(k));
matchstarts_22(k) = regexp(TrajCompact(k,1), '2.+?2'); %for '22'
matchcounts_22(k) = cellfun(@numel, matchstarts_22(k));
matchstarts_33(k) = regexp(TrajCompact(k,1), '3.+?3'); %for '33'
matchcounts_33(k) = cellfun(@numel, matchstarts_33(k));
matchstarts_44(k) = regexp(TrajCompact(k,1), '4.+?4'); %for '44'
matchcounts_44(k) = cellfun(@numel, matchstarts_44(k));
end
count_00=sum(matchcounts_00);
count_11=sum(matchcounts_11);
count_22=sum(matchcounts_22);
count_33=sum(matchcounts_33);
count_44=sum(matchcounts_44);
I find the solution:
[digits{1:2}] = ndgrid(0:4);
matchcounts = cell(size(digits{1}));
for k = 1:numel(matchcounts)
matchstarts = regexp(TrajCompact, sprintf('%d.+?%d', digits{1}(k), digits{2}(k)));
matchcounts{k} = cellfun(@numel,matchstarts);
end
If you don't need all of those intermediate values, then you can reduce how much memory you are using:
idx = cellfun('isclass',TrajCompact,'char');
vec = 0:4;
for k = numel(vec):-1:1
rgx = sprintf('%d.+?%d',vec(k),vec(k));
tmp = regexp(TrajCompact(idx),rgx);
cnt(k) = sum(cellfun('prodofsize',tmp));
end
Thanks! Now I try it! I want also the same elements in brackets but I don't know how to do it. I try to modify the first expression in sprintf: sprintf('(%d.+?%d)',...) adding bracket but the code doesn't give me the right answer!
That is because parentheses () have a special meaning in regular expressions. To treat them as being literal parentheses you need to escape them with a backslash (note that the backslash itself needs to be escaped inside the sprintf):
sprintf('\\(%d.+?%d\\)',...)
Solve the problem with ('\\(%d%d\\)': ( and ) are special characters in regular expressions. To match them as literal characters they need to be escaped with a preceding \ in the regular expression. But since \ is also a special character in sprintf patterns, I need to double escape it so sprintf ignores the slashes and regexp sees them. Thanks too much for the help!
I have feeling that your original regular expression is not actually matching what you think it is. Do you want it to include parentheses in the matched substrings?
>> rgx = sprintf('%d.+?%d',V(k),V(k));
>> X = regexp(TrajCompact(idx),rgx,'match');
creates a cell array of the matched substrings for 0, and the sixth cell contains this:
'021(23)(12)32131231(32)(13)(23)1321(32)30'
Do you want it to include parentheses, or not?
Note: the general search pattern is '(\d).+?\1'

此问题已关闭。

关闭:

2021-8-20

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by