regexp - match regular expression question

2 次查看(过去 30 天)
Hi all,
In the Matlab 'help' documents for the function called regexp, I'm trying to understand the what the vertical line ( ie. | ) means in the pattern layout below. The example below comes directly from Matlab's help area .... after typing 'help regexp'.
The help documentation indicates:
"|" means Match subexpression before or after the "|"
What I would like to ask is. What does the above mean exactly? At the moment, I'm thinking 'which is it?' .... I was expecting that a match would either be 'before', or it would be 'after'.... but not both before OR after. But even if it really means 'match before OR after', what does that mean exactly? For example, what does "|" actually represent?
Thanks in advance.
str = 'John Davis; Rogers, James';
pat = '(?<first>\w+)\s+(?<last>\w+)|(?<last>\w+),\s+(?<first>\w+)';
n = regexp(str, pat, 'names')
  2 个评论
Stephen23
Stephen23 2016-9-30
The | is an exclusive or. Here is an example of how it works, tested on a string with four slightly different "words":
>> regexp('a123z a%%%z a1%3z a__z','a(\d+|%+)z','match')
ans =
'a123z' 'a%%%z'
The pattern matches all sequences starting with a, ending with z, and containing XOR(digits,%-symbols). The third "word" in the string does not match this because it contains both digits and %-smbols, the fourth contains only underscore, so also does not match the regex. Now lets alter the regex and use two |, to give XOR(digits,%-symbols,underscores):
>> regexp('a123z,a%%%z,a1%3z,a__z','a(\d+|%+|_+)z','match')
ans =
'a123z' 'a%%%z' 'a__z'
Bonus if you want a convenient way to test and experiment with regular expressions, you can try my FEX submission:
Kenny
Kenny 2016-9-30
编辑:Kenny 2016-10-1
Hi Stephen !! Thanks for going out of your way to help me as well. The example that you gave is truly excellent. Thanks very much for showing this. The regexp function is so powerful, but it helps a great deal when you and S.S. add great understandable examples. When I first looked at those 'code' patterns from inbuilt examples, it didn't have the nice explanations that allowed followers to follow through, and understand. Thanks for mentioning XOR, and the bonus link too! Best regards! Thanks a lot again. Kenny

请先登录,再进行评论。

采纳的回答

Star Strider
Star Strider 2016-9-30
编辑:Star Strider 2016-9-30
When I’ve used the ‘|’ (‘or’) operator, I’ve used it to match either of the two (or more) sub-expressions in the expression string. In this instance, if it detects a comma it labels the first string as the last name and the second expression as the first name. If it does not detect a comma, it does the reverse. The presence or absence of a comma in the target string determines which sub-expression will return the result, because the target string with a comma will return an empty value for the sub-expression without a comma, and the reverse is true for the other sub-expression.
If you want to see how this works in practice, try it with only one sub-expression (and without the ‘|’ operator). That’s the easiest (and most instructive) way to see how a particular syntax works.
EDIT Clarified an ambiguity in the original.
  2 个评论
Kenny
Kenny 2016-9-30
Thanks so much for your help and time S.S. ! That helped me a lot tremendously. Thanks for helping me. Genuinely appreciated S.S.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Just for fun 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by