Looking for an alternative to regexp.

6 次查看(过去 30 天)
I'm looking for an alternative way to parse through strings to find bits of information, or for a way to use regexp that doesn't give me nested cells. I'm tired of dealing with the nested cells.
I've got a string that contains node numbers and locations. I would like to capture all of the node numbers, and then put them into a double array. I can identify and extract the numbers with regexp, but any time I use regexp with tokens I end up with cells inside of cells for a reason that I don't entirely understand. Am I doing something to create the extra layer of cells, or is there another command that can parse and extract the information I want?
singlestring = 'nxyzs=74xyz[0]:-2.0447000e+010.0000000e+001.8288000e+00Nearestnodeis7736664atadistanceof4.6823094e-03locatedat-2.0451682e+012.2396341e-161.8288000e+00';
repeatstrings = repmat(singlestring,1,5);
nodes = regexp(repeatstrings,'Nearestnodeis(\d+)','tokens');
The nodes variable will contain a 1x5 cell matrix, where each cell contains a 1x1 cell, which contains the node number string.
  2 个评论
Stephen23
Stephen23 2021-3-24
编辑:Stephen23 2021-3-25
Tokens are always returned in a cell array (with size equal to the number of tokens (thus in your case scalar, because you only specified one token)). If multiple matches is enabled (the default) then every output is nested in a cell array (with size equal to the number of matches made), so you will get nested cell arrays of tokens.
FYI, if you only need to match the regular expression exactly once, then you can specify the 'once' option and the outputs are not nested in cell arrays. This does not apply to your example, but is useful in other cases.
As well as concatenating the output data or using named tokens as the answers below show, you can also use a look-behind assertion and return the matched string (no nested cell arrays), which makes post-processing much simpler:
nodes = regexp(repeatstrings,'(?<=Nearestnodeis)\d+','match')
nodes = 1×5 cell array
{'7736664'} {'7736664'} {'7736664'} {'7736664'} {'7736664'}
vec = str2double(nodes)
vec = 1×5
7736664 7736664 7736664 7736664 7736664
Bob Thompson
Bob Thompson 2021-3-24
Thanks, I definitely think this is more smooth than what I usually attempt.

请先登录,再进行评论。

回答(2 个)

Star Strider
Star Strider 2021-3-23
See if adding either:
Out = cell2mat([nodes{:}].')
or:
Out = str2num(cell2mat([nodes{:}].'))
to the posted code provides the desired result.
Note that str2num is not generally recommended, however it works when str2double produces an unacceptable result.

Walter Roberson
Walter Roberson 2021-3-23
singlestring = 'nxyzs=74xyz[0]:-2.0447000e+010.0000000e+001.8288000e+00Nearestnodeis7736664atadistanceof4.6823094e-03locatedat-2.0451682e+012.2396341e-161.8288000e+00';
repeatstrings = repmat(singlestring,1,5);
nodes = regexp(repeatstrings,'Nearestnodeis(?<NN>\d+)','names');
str2double({nodes.NN})
ans = 1×5
7736664 7736664 7736664 7736664 7736664
  3 个评论
Walter Roberson
Walter Roberson 2021-3-23
(?<WORD>PATTERN)
creates a named token; whatever is matched by PATTERN gets stored in a struct field named WORD, as text. But even though it is called a "named token", oddly enough to get back the struct, you have to ask for "names" instead of for "tokens".
You get back a struct array, one struct array entry for each time the overall pattern matches -- in this case one for each time Nearestnodeis is followed by a sequence of digits. So a 5 x 1 struct in this case, each with a field named as indicated, NN. So as usual with struct arrays you call pull out all of the entries using struct expansion inside a {}, creating a cell array of character vectors, and then you can convert them all at once using str2double() on the cell array.
Bob Thompson
Bob Thompson 2021-3-24
Thanks for the explanation. I do like structures better than cells, most of the time.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 String Parsing 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by