Regexp to extract standalone numbers from string
18 次查看(过去 30 天)
显示 更早的评论
Hello,
I'm trying to extract numbers from a txt file which contains tables where the elements are separated by different amount of white space.
The content might look like the example below and variable rows and columns. However the amount of "free" numbers is always the same
To get the file in matlab i read it line by line using fgetl
str{1,1} = 'X?YYx0123 [un] 21ZZz20AaaB00 A200.1 21 Xx2222 202 203.02 -204.001 A(2) B(V31) 1 01 - -'
My goal is to extract only the numbers that are not part of text string. So that would be 21, 202, 203.02, -204.001, 1, 01. So that would be both decimal separated and non-decimal separated numbers.
I've played a bit with the regexp patterns and the closest i get is to use;
rxpPat = '\d+\.?\d*';
regexp(str{1,1},rxpPat,'match')
The problem with that is that it will also catch the numbers from X?YYx0123 and that way distorts my result.
Do you have an idea how i can approach the problem?
0 个评论
采纳的回答
Cris LaPierre
2022-12-11
str{1,1} = 'X?YYx0123 [un] 21ZZz20AaaB00 A200.1 21 Xx2222 202 203.02 -204.001 A(2) B(V31) 1 01 - -';
regexp(str{1,1},'(?<=\s)[+-]?\d+\.?\d*(?=\s)', 'match')
2 个评论
Walter Roberson
2022-12-12
str{1,1} = '404 X?YYx0123 [un] 21ZZz20AaaB00 A200.1 21 Xx2222 202 203.02 -204.001 A(2) B(V31) 1 01 - - 92';
christ = regexp(str{1,1},'(?<=\s)[+-]?\d+\.?\d*(?=\s)', 'match')
wdr = str2double(regexp(str{1,1}, '(?<=^|\s)[+-]?\d+(\.\d*)?(?=\s|$)', 'match'))
That is, the version Cris posted does not find the numbers if they are first or last in the string, but the version I posted in my Answer does.
更多回答(4 个)
Steven Lord
2022-12-11
I wouldn't use regexp here. I'd use string, strsplit, and double.
S = 'X?YYx0123 [un] 21ZZz20AaaB00 A200.1 21 Xx2222 202 203.02 -204.001 A(2) B(V31) 1 01 - -'
S = string(S);
parts = strsplit(S, ' ')
Because we converted S from a char vector into a string array above, we can use double to turn those elements of parts that are the text representation of valid numbers into those numbers while turning the other strings into NaN. If we'd left them as a char array we'd get the values of the characters that make up the text representations of those numbers, not the numbers themselves.
notWhatWeWant = double(char(parts(5))) % double('21') is not 21
D = double(parts) % double("21") is 21
Now just remove the NaN values. This does assume that NaN is not a valid numeric value in your string that you want to extract.
validparts = D(~isnan(D))
0 个评论
Voss
2022-12-11
编辑:Voss
2022-12-11
Very similar to Steven Lord's answer, but using str2double() instead of converting to string and using double():
str{1,1} = 'X?YYx0123 [un] 21ZZz20AaaB00 A200.1 21 Xx2222 202 203.02 -204.001 A(2) B(V31) 1 01 - -'
D = str2double(strsplit(str{1,1}));
D = D(~isnan(D))
0 个评论
Image Analyst
2022-12-11
I don't understand what the problem is. What's wrong with getting the numbers from X?YYx0123?
By the way, here is the new way to get numbers:
str{1,1} = 'X?YYx0123 [un] 21ZZz20AaaB00 A200.1 21 Xx2222 202 203.02 -204.001 A(2) B(V31) 1 01 - -'
pat = digitsPattern
numbers = extract(str{1,1}, pat)
0 个评论
Walter Roberson
2022-12-11
编辑:Walter Roberson
2022-12-11
format short
S = 'X?YYx0123 [un] 21ZZz20AaaB00 A200.1 21 Xx2222 202 203.02 -204.001 A(2) B(V31) 1 01 - -'
D = str2double(regexp(S, '(?<=^|\s)[+-]?\d+(\.\d*)?(?=\s|$)', 'match'))
- This supports optional positive or negatives sign
- This supports the possibility that the value is an integer with no decimal point
- This supports the possibility that the value has a decimal point but there are no digits after the decimal point
- This specifically checks for whitespace before and after the number, so the A200.1 would not be matched. But that also means that comma directly after a number is not supported.
- This does not support exponent notation with d or D or e or E, and with optional + or - before the exponent values
- This does not support number starting directly with the decimal point without a 0 before the decimal point, such as .2
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Characters and Strings 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!