How to parse an Nx1 string array without looping through N

Question

Leslie 2020-4-23

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/520022-how-to-parse-an-nx1-string-array-without-looping-through-n

编辑： Leslie 2020-4-23

I have an Nx1 string array, and I can't figure out how to extract 6 chunks of text out of it and into an Nx6 cell array. The text elements are numbers, but it's simplest to not treat them as numbers at this juncture.

Here is a toy version of the string array, together with code that correctly parses out the necessary elements of CCYYMMDD and hhmm from the first element of the string array:

stringFile = ["nsasondewnpnC1.b1.20020428.184800.cdf"; ...
              "nsasondewnpnC1.b1.20020428.220500.cdf"; ...
              "nsasondewnpnC1.b1.20020428.235900.cdf"; ...
              "nsasondewnpnC1.b1.20020429.013100.cdf"; ...
              "nsasondewnpnC1.b1.20020429.182500.cdf"];
charLaunch = textscan(stringFile(1),'%*18c %2c %2c %2c %2c %*c %2c %2c');

charLaunch =

1×6 cell array

{'20'} {'02'} {'04'} {'28'} {'18'} {'48'}

However, both

charLaunchAll = textscan(stringFile,'%*18c %2c %2c %2c %2c %*c %2c %2c');

and

charLaunchAll = cell(5,6);
charLaunchAll = textscan(stringFile(:),'%*18c %2c %2c %2c %2c %*c %2c %2c');

generate the same error message:

Error using textscan

First input must be a valid file-id or non-empty character vector.

Is there a way to extract these pieces of texts out of every array member without building a loop?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Stephen23 2020-4-23

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/520022-how-to-parse-an-nx1-string-array-without-looping-through-n#answer_427774

编辑：Stephen23 2020-4-23

在 MATLAB Online 中打开

Using one simple regular expression:

C = {...
    'nsasondewnpnC1.b1.20020428.184800.cdf'; ...
    'nsasondewnpnC1.b1.20020428.220500.cdf'; ...
    'nsasondewnpnC1.b1.20020428.235900.cdf'; ...
    'nsasondewnpnC1.b1.20020429.013100.cdf'; ...
    'nsasondewnpnC1.b1.20020429.182500.cdf'};
out = regexp(C,'\d{2}','match');
out = vertcat(out{:})

I used a cell array of character vectors, but it will also work for a string array.

5 个评论
显示 3更早的评论隐藏 3更早的评论

Stephen23 2020-4-23

编辑：Stephen23 2020-4-23

在 MATLAB Online 中打开

"... why textscan will work with a single element of a string array, but not with an entire array of strings?"

Because low-level string parsing functions parse one string element or one character vector, and textscan is ultimately just a fancy wrapper for low-level operations.

You might think of a string array as one thing, but really it is a container array of multiple character vectors, i.e. it contains lots of individual, separate character vectors, which are stored separately. Not so different from a cell array, really (search this forum for more accurate and detailed discussions on how string arrays are actually implemented).

Parsing a string array introduces ambiguities: e.g. what is the end-of-line character? textscan relies on identifying that character... but parsing a string array would (possibly, see below) require having no EOL character at all, and instead treating each string element as being de-facto delimited by some character (in which case you can trivially do this yourself, as I did in my last comment). You might think it is obvious that each string element should be treated as one line, but computers do not understand "obvious", they understand instructions in the form of code. Consider how this 2x1 string array should be parsed:

str = ["1";"2\n3"] % \n = newline

which of these should textscan(str,'%f') return?:

[1;2;3] all values, identify both newline AND different string elements as having de-facto EOL.
[1;2] newline causes parsing to finish.
[1] second element does not parse.
{[1];[2;3]} the output is not of the class requested, and the cell contents can have an arbitrary size.
error second element throws an error.

If you say the first is the correct behavior, what about the next user who expects one of the other behaviors?

Note also that text files also consist of one long character vector (people think of them as having "lines", but really they are all one long character vector interspersed with newline characters), and low-level file parsing functions also parse just that one character vector.

Leslie 2020-4-23

编辑：Leslie 2020-4-23

OK, thanks. I'd noticed that what I was trying to do "all at once" would have worked if I'd been reading a file and could have searched for the newline character, but didn't (or couldn't) carry that all the way forward to understanding how the string array was being stored. It just never occurred to me to do something like "ignore through the 'cdf' at the end of the string", which is an analog to the documentation's example of "ignore the rest of the line".

请先登录，再进行评论。

Answer 2

Mohammad Sami 2020-4-23

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/520022-how-to-parse-an-nx1-string-array-without-looping-through-n#answer_427760

在 MATLAB Online 中打开

Since the pattern in your string seems to be the same, you can use the format specification to convert the string directly to datetime as follows.

stringFile = ["nsasondewnpnC1.b1.20020428.184800.cdf"; ...
              "nsasondewnpnC1.b1.20020428.220500.cdf"; ...
              "nsasondewnpnC1.b1.20020428.235900.cdf"; ...
              "nsasondewnpnC1.b1.20020429.013100.cdf"; ...
              "nsasondewnpnC1.b1.20020429.182500.cdf"];
fmt = "'nsasondewnpnC1.b1.'yyyyMMdd'.'HHmmss'.cdf'";
% the constant portion of your string is enclosed in 'single quotes';
d = datetime(stringFile,'InputFormat',fmt);

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Leslie 2020-4-23

Thanks, interesting useage that I didn't know about.

But I don't really want it in datetime format; I'd like the 2-digit text chunks. If I've got to clutter up my code with sending it to datetime & back, I might as well write the stupid loop. (I'm not meaning to be cranky at you; I'm just cranky that I spent a few hours today poring over documentation and Answers to do something that it seems I ought to be able to do!)

请先登录，再进行评论。

How to parse an Nx1 string array without looping through N

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

5 个评论
显示 3更早的评论隐藏 3更早的评论

更多回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

Community Treasure Hunt

How to parse an Nx1 string array without looping through N

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

5 个评论 显示 3更早的评论隐藏 3更早的评论

更多回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

5 个评论
显示 3更早的评论隐藏 3更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论