Hello,
I understand your need a regular expression to match with two distinct forms, capturing word characters at the start and end, an optional parenthesized negative number, and an ampersand separator. The challenge lies in excluding matches where digits appear without parentheses.
The following regular expression effectively addresses your requirements:
expr = '(?<=^|\s)(?<before>\w+)(?:\(-(?<digit>\d+)\))?&(?<after>\w+)(?=\s|$)';
It enforces “all or nothing” presence of the parenthesis and minus sign before the digits. i.e. the pattern should only match when the entire structure is present, and not when only parts of it are found.
Explanation:
- (?<=^|\s): Positive lookbehind to ensure the match starts at the beginning of the string or immediately after whitespace.
- (?<before>\w+): Captures the initial sequence of word characters (letters, digits, or underscores).
- (?:\(-(?<digit>\d+)\))?: An optional non-capturing group that matches the parentheses and the negative number inside it.
- (?<after>\w+): Captures the sequence of word characters after ampersand
- (?=\s|$): Positive lookahead to ensure the match ends before whitespace or at the end of the string.
With the forementioned expression
regexp('vvv&mp abvg(-5)&ads abvg-5&ads', expr, 'names')
returns the following struct:
fields before after
1. 'vvv' 'mp'
2. 'abvg' 'ads'
which is the expected outcome.
There is no need for conditional sub-patterns in this case, as the grouping and anchoring in the regular expression are sufficient to enforce the desired structure.
Hope this helped.