Combine Multiple Tokens to Match Using regexp
14 次查看(过去 30 天)
显示 更早的评论
Hello, I need to match two single quotes in a string using regexp(); unless I missed it in the official documentation, I have only found this mentioned on an older non-MathWorks website, where the single detail given is this:
\(\)\(CA\)+combines multiple tokens into one
This isn't massively helpful and doesn't provide the results I need; I had assumed it would work like this:
string2 = string1(regexp(string1, '[otherstufftomatch\(''\)]'));
therefore causing the function to try and match two successive quotes. All of the other characters are matched correctly and assigned to string2, but still only a single quote, whereas I need both. It doesn't flag as a syntax error or anything, so my assumption is that this doesn't do what I think it does, and I'm just matching all the characters '\()' individually. For context, here is my code now, which is working but isn't returning the double-quotes. I have tried some of the additional features on regexp like grouping them together using parenthesis, surrounding it with non-word escape characters ('\W''\W') and using the * to indicate matching it multiple times. The difficulty is that putting one ' into the characters to match terminates the string so I've had to put two in there, but I don't think this is doing what I think it does:
rawString = 'I just can''t seem to get this working correctly.';
matchThis = '[AEIOUaeiou., (\W''\W)]';
vowelsOnly = rawString(regexp(rawString, '[AEIOUaeiou., (\W''*\W)]'));
Any chance anybody knows how to do what I need here? Thanks in advance!
1 个评论
Stephen23
2020-2-2
编辑:Stephen23
2020-2-2
"I need to match two single quotes in a string using regexp(); unless I missed it in the official documentation..."
The single quote character has no special meaning at all for regular expressions, so you won't find it mentioned in the regular exprssion documentation (just like you won't find every other non-special character listed by name). But because the single quote is used to define a character vector in MATLAB it needs to be escaped/doubled within a character vector in order to define one single quote character, as the documentation explains: "If the text includes single quotes, use two single quotes within the definition."'
Also note that your code:
string2 = string1(regexp(string1,'...'));
will return at most one character from each match, because the default first output of regexp is "startIndex", which by definition is one index (a subvector starts in one location). I suspect that you might find the "match" output more useful. E.g. here is a simple example of a regular expression that matches multiple digits:
>> str = 'abc456xyz';
>> str(regexp(str,'\d+')) % what you are doing
ans = 4
>> regexp(str,'\d+','match','once') % all matched characters
ans = 456
回答(1 个)
Walter Roberson
2020-2-2
' normally terminates a character vector but '' encodes a single ' inside a character vector, not two of them in a row.
I suggest either switching to string or using {2} after the ''
Caution: inside [] you cannot construct patterns. () have no special meaning inside []
You could potentially code
"([AEIOUaeiou]|'{2})"
though it does seem odd to me that '' would be considered a vowel?
2 个评论
Stephen23
2020-2-2
@Rowan Lawrence : please upload a .mat file containing some of the strings/character vectors that you are trying to parse, together with the expected output.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Characters and Strings 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!