Need help with regexpi expression for multiple variants of the same phrase
1 次查看(过去 30 天)
显示 更早的评论
I have a question regarding the use of regexpi to determine if certain string words are input into a text file. The text files were created by multiple individuals and use slightly different phrasing to mean the same variable. For example in a text file containing a gait evaluation the phrase 'slow cadence' was recorded, but 'slow cadence' can be denoted as 'slow cadence' or 'slow stepping'. My original code was as follows:
data=fileread('Test.txt');
A=isempty(regexpi(data{'slow cadence','slow stepping'}));
However, this version can return a false positive as it will mix and match string within the {}. For example the following code for the same file will return a '0' for the isempty function even though none of the string phrases match completely:
data=fileread('Test.txt');
A=isempty(regexpi(data{'fast cadence','slow stepping'}));
I feel like I am missing a simple command to indicate that A can be 'slow cadence' OR 'slow stepping'. Any help is much appreciated.
0 个评论
回答(2 个)
Stephen23
2022-12-14
编辑:Stephen23
2022-12-14
You will probably find the 'ONCE' option also very very very useful (here I inverted the logical output, because true=contains is usually much simpler to work with than messing-with-your-head true=doesnotcontain):
str = fileread('Test.txt');
idx = ~isempty(regexpi(str, 'slow (cadence|stepping)','once'))
Using regular expressions requires reading the documentation again and again and again and again and again... it takes quite a while to get profficient and comfortable using them. Also, make sure you read the documentation.
You might also find my interactive tool useful for helping to develop regular expressions:
I should also mention, that if you want to use regular expressions then you need to read the documentation. A lot.
PS: Another approach using the newer CONTAINS and patterns:
pat = regexpPattern('slow (cadence|stepping)');
idx = contains(str,pat, 'ignorecase',true)
0 个评论
Fifteen12
2022-12-14
I think you want to look at making regular expressions. Try this:
A=isempty(regexpi(data,'(slow cadence|slow stepping)'));
You'll probably want to do more case matching as well, using wild cards to subsitite for white spaace, etc. You can find more here
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Type Conversion 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!