Elegant way to extract data from text files with an arbitrary format?
1 次查看(过去 30 天)
显示 更早的评论
Hi Guys,
I need to process a large number of text files to extract numerical data. The data is fairly complex, as the files have a arbitrary format and contain several different blocks of data. To illustrate:
Boys Names:
Tom Dick Harry...
Animals:
Cat Dog Squirrel Triceratops Shark...
Rectangle Properties:
x0 y0 width height angle
0 1 4 2 30
-1 2 5 1.5 0.5
7 1 4 5 22
3 9 7.5 6 0
Some more data...
The challenge is that the data I need to access is somewhere in the middle of each file. I never know where the block (Rectangle Properties in this case) will show up. There could, for example, be a large number of records under the Names or Animals sections, which means I need to locate the Rectangle section of the file. To complicate things further - I don't know how many rectangles I need to read in.
The header "Rectangle Coordinates" only appears once in each file. The sub-header line "x0 y0...." occurs in several places (e.g. different shapes).
My current approach is:
- Scan through the file (using fgetl) until I get to the "Rectangle Coordinates:" header.
- Skip a line (I don't need the sub-header)
- Read 5 items of numerical data (sscanf) from each of the subsequent lines until I reach a blank line
This works fine, but I'm wondering if there'a a more elegant approach, perhaps using regular expressions or some other technique?
The data files I'm processing are quite large and I need to extract several different blocks of data (e.g. Rectangles, Triangles, Circles). Each block has a unique header but may have a one or more sub-header lines which are not unique. The number of data items in each block varies, and there is no way to know how many items there are when I begin processing the data. This makes it difficult to produce a "one size fits all function" and the code gets pretty messy.
Any advice would be appreciated!
B
1 个评论
Walter Roberson
2015-12-1
For the blocks that you need, is the order of blocks fixed?
Is the first line of the file always the same?
回答(1 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Import and Export 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!