how to extract strings between two newline characters using regexp
35 次查看(过去 30 天)
显示 更早的评论
I have a string:
S = sprintf('\n1 2\n3 4\n')
And I want to extract '1 2' into a cell and '3 4' into a cell, using the following code:
a=regexp(S, '\n\d+[^\n]+\n','match')
But '3 4' was not extracted.
Did I do something wrong?
0 个评论
采纳的回答
Walter Roberson
2017-7-12
When you used
a=regexp(S, '\n\d+[^\n]+\n','match')
then the \n at the end "eats" the \n after '1 2', leaving the remaining string as '3 4\n' . That string then does not match the pattern that begins with \n
Consider
a = regexp(S, '(?<=\n)\d[^\n]*', 'match')
This leaves the \n in place; and the (?<=\n) part requires that the \n before the digit be there but does not include the \n in the output.
3 个评论
Walter Roberson
2017-7-13
overlapping in regular expressions typically requires using zero-width assertions.
Some programming languages such as a newer python regexp module support an "overlapped" switch.
In some programming language such as perl, there are tricks that can be done with essentially evaluating code in the middle of a match, but that code has to get a bit complicated to handle backtracking properly. See for example http://www.perlmonks.org/?node_id=463461
更多回答(2 个)
Sayam Ganguly
2017-7-12
From your question I understand that you have a string '\n1 2\n3 4\n' and you want to extract '1 2' and '3 4' into a 1*2 cell array. I would like to suggest a different regexp that should help you achieve your workflow.
a=regexp(S, '.*','match','dotexceptnewline')
Here '.*' automatically considers all the characters but because of the 'dotexceptnewline', the '/n' characters are not considered and you get a 1*2 cell array split with your desired result. In case of your approach the entire pattern was getting matched only once and was not getting repeated.
4 个评论
Walter Roberson
2017-7-13
I programmed in perl for a few years. The rules are not easy to remember. There are multiple books explaining perl regular expressions. For example O'Reilly's "Mastering Regular Expressions" http://shop.oreilly.com/product/9780596528126.do which is over 500 pages.
Most perl regular expression authors make mistakes even on comparatively simple tasks such as matching the valid floating point numbers. Hardly anyone gets right tasks such as balancing brackets (a task which is not possible with true regular expressions, and not possible with perl basic regular expressions, requiring perl extended regular expressions.)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Characters and Strings 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!