problems with a regex

2 次查看(过去 30 天)
Thomas
Thomas 2013-7-9
Hi.
I'm trying to create a regular expression to match and extract some information. Two examples of the source string
example one: 10/0/leaf.nr.0 is a Projection error - touches edge - 3D points.csv
example two: 10/2/leaf.nr.2 is a Projection error - 3D points.csv
I want to extract the string between "is a " and " - touches edge" OR " - 3D" In both example strings this would be "Projection error" but this can be something else.
Currently I have the pattern:
'.*is\sa\s(?<type>.*)(?:\s\-\stouches\sedge)?(?:\s\-\s3D).*.csv'
for example one this returns (not expected):
'Projection error - touches edge'
but for example two it returns(expected):
'Projection error'
IF I change the pattern to:
'.*is\sa\s(?<type>.*)(?:\s\-\stouches\sedge)(?:\s\-\s3D).*.csv'
so I require the (?:\s\-\stouches\sedge) to be matched it returns (correctly):
'Projection error'
for example one but now example two (that dont have the the "touches edge" part ) will not match(of cause).
I dont get why example one also contains the " - touches edge" in the result using the first pattern when I ask it to match this pattern 0 or 1 times.
Any help will be highly appreciated.
Best regards, Thomas
  1 个评论
Thomas
Thomas 2013-7-9
My current solution is to use this pattern instead:
'.*is\sa\s(?<type>[\w\s]*)(?:\s\-\s)?.*'
It results in the needed information except an extra space character are added. So the result for both example one and two are now:
"Projection error "

请先登录,再进行评论。

回答(2 个)

Muthu Annamalai
Muthu Annamalai 2013-7-9
A simple solution to parse the string with rule
"is a " and ( " - touches edge" OR " - 3D" )
is to use sequential regexp().
That way you know "is a" bit of your source is split out, and then you can search for which of 2 alternatives are present in your case.
Also see the 'NOT' exclusion class operators in regexp, and 'split' mode of regexp.
http://www.mathworks.com/help/matlab/ref/regexp.html
  1 个评论
Thomas
Thomas 2013-7-9
Thanks for your response.
My task is not to match either of the two cases - its simply to extract the string between "is a " and the first " - " (This is a new, shorter, formulation of my problem that I just realized)
Splitting would be a way to go but I would like to know if its possible to create a regex for it.

请先登录,再进行评论。


per isakson
per isakson 2013-7-9
编辑:per isakson 2013-7-9
to extract the string between "is a " and the first " - " This formulation is close to a pseudo-code for the expression we search.
ex1 = '10/0/leaf.nr.0 is a Projection error - touches edge - 3D points.csv';
ex2 = '10/2/leaf.nr.2 is a Projection error - 3D points.csv';
regexp( ex1, '(?<=is a )[^\-]+(?= \- )', 'match' )
regexp( ex2, '(?<=is a )[^\-]+(?= \- )', 'match' )
returns
ans =
'Projection error'
ans =
'Projection error'
Search the doc for "Lookaround Assertions" or just "Lookaround". Lookahead Assertions in Regular Expressions
PS. '\-' or just '-' ; a backslash (escape) too many seldom hurts and I've problems to remember when it's needed.
.
OR according to the requirement of the OP
regexp( ex1, '(?<=is a ).+?(?= ((\- touches edge)|(\- 3D)))', 'match' )
regexp( ex2, '(?<=is a ).+?(?= ((\- touches edge)|(\- 3D)))', 'match' )
The extra parentheses, (), makes the expression more readable - imo.
The "?" in ".+?" is the
Lazy expression: match as few characters as necessary.

类别

Help CenterFile Exchange 中查找有关 Characters and Strings 的更多信息

标签

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by