How do I parse mixed, dynamic binary & string files?

5 次查看(过去 30 天)

I'm having trouble parsing files that are mixed strings & numbers. The data I need is in 3 columns (comma-delimited), and isn't in any specific form. It can have 10 leading gibberish characters, it might have some letters, symbols, numbers, it might say "Error: hrere" on 1 line (for example). I've used textscan, strread, !strings, fgetl with strread, but I can't seem to get what I need out into 3 variables [Col1 Col2 Col3]. Been racking my brain...How can I do this! Here's a sample of the file:

@#*%;AJ))&3#* a) 24.568, 34.1024, -0.1023

&$@!*(!( (*&Y$)@ 24.568, 34.1020, -0.0888

()(@E$@!*(!( (*&Y$)@ 23.568, 34.1020, -0.0888

$64&$@!*(!( (*&Y$)@ 24.4568, 34.0020, -0.0888

Bad Command

$64&$@!*(!( (*&Y$)@ 24.4568, 34.0020, -0.0888

$64&$@!*(!( (*&Y$)@ 24.4568, 34.0020, -0.0888

&!)*~*(ER!( (*6&Y$)@ 24.568, 34.1020, -0.0888

(*!$)^@ 23.568, 34.1020, -0.0888

etc....

The closest I got was using something like:

fid = fopen('file.txt','r');
tline = fgetl(fid);
[c1 c2 c3] = strread(tline,'%f','delimiter',',');
fclose(fid);

but I can't iterate it, and it quits also if I read a line with a bad string (non-floating point)

  1 个评论
dpb
dpb 2014-1-17
That's a {insert proverbial adjective here}...
It looks like the one consistent thing is that there's a blank preceding the first floating point value for the lines with valid data. I'd probably try to locate that on each line by a rear-to-front search for the second comma delimiter location and then the preceding blank prior to that, then try to convert that substring.
It'll take quite a lot of logic to then add to the point of being able to handle all the other special cases you find I suspect.
Once in a former life had the problem of processing large amounts of data returned from power plant monitoring computer via punch paper tape that was always rife with mispunches and the like...it was a similar lot of work to write a reasonably robust processor to salvage them. From that experience, "good luck".
regexp may also be your friend here...

请先登录,再进行评论。

采纳的回答

Walter Roberson
Walter Roberson 2014-1-17
fid = fopen('file.txt','r');
datacell = textscan('%s%s%s', 'Delimiter', ',');
fclose(fid);
col1s = regexprep( datacell{1}, '^.*\s', '' );
Col1 = str2double(col1s);
Col2 = str2double(datacell{2});
Col3 = str2double(datacell{3});
There will be NaN in any entry that did not match the proper format.
  2 个评论
Tom W
Tom W 2014-1-23
Kudos! I was able to use another method, however, your suggestion worked in much fewer lines of code than what I was using. To learn, I don't quite understand the '^.*\s' syntax, what does that interpretively say or tell the program? I'm wondering if that syntax would be of use elsewhere if I understand it better. Thanks again!
Walter Roberson
Walter Roberson 2014-1-23
'^.*\s' is a regular expression, which is a pattern that needs to be matched. The '^' means that the match must occur at the beginning of a line. The . means to match any one character. The * modifier after the . means to extend the previous specification (the dot) as far as possible to the right such that the rest of the pattern afterwards is still satisfied -- so to gobble as many characters as you can such that the rest still works. The \s means any one whitespace character (such as a blank). Re-interpreting this, it means to start at the beginning, find the last space in the string, and take everything from the beginning up to and including that space.
This pattern is inside a regexprep() call, which says to replace the matched string with what is described in the next argument. The next argument I gave is '' which is the empty string. So the effect is to delete all characters from the beginning of the line up to and including the final space, leaving the last series of non-blank characters alone. In other words, to cut out everything except the last column.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Text Data Preparation 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by