Reading in ugly data files
显示 更早的评论
I created a simple data processing script using importdata. I am trying to process a new txt file using this script, but the structure of the data is very different, and importdata is getting tripped up somehow. I've decided to try and change the program a bit to use something more flexible, like textscan.
First, what do you recommend for reading in text file data that has both strings and numerical data? Is textscan really the best option?
Second, how do I deal with HUGE swaths of empty data cells in this particular text file?
edit: I know it says not to do this, but it has become obvious that I should say I am very new to matlab, so I don't really know what you mean when you say "EmptyValue" and "TreatAsEmpty." How do I properly use these parameters when calling the textscan function?
5 个评论
per isakson
2012-10-26
编辑:per isakson
2012-10-26
- What type of answer do you expect?
- Did you read the documentation on textscan?
- "How do I properly use these parameters when calling the textscan function?" The answer is in the documentation.
- Did you experiment with reading a small text file with textscan?
- Did you search for textscan here at Answer?
Matt Kindig
2012-10-27
Hi Ryan,
If you can post the first 10 or so lines of your file, that would help us to recommend solutions. Also post a portion of the file around the "HUGE swaths of empty data"-- that will help us to understand how the format changes around these lines, and how to modify your script accordingly. If you do this, we will probably be able to offer some more concrete advice.
Thanks, Matt
Ryan Egan
2012-10-29
Matt Kindig
2012-10-29
Hi Ryan,
From this line, what data do you need to extract? I'm thinking that regular expressions (regexp() function) would be better for you. In my experience working with irregularly structured text files, regexp() is more flexible/efficient than textscan() or the like. From this line, what form of the output data do you expect?
Ryan Egan
2012-10-30
采纳的回答
更多回答(2 个)
per isakson
2012-10-26
编辑:per isakson
2012-10-26
0 个投票
- textscan is a good alternative for "... both strings and numerical data"
- with textscan all data rows need to have the same format otherwise it becomes a bit tricky.
- The options EmptyValue and TreatAsEmpty will take care of "empty data cells"
- HUGE means different things to different people. The amount of empty cells shouldn't be a problem.
I have been doing something similar recently.
I think textscan should work,
% open the file (replace datapath with your file location)
fid = fopen(datapath);
% skip first ten lines (change the bufsize if it's not big enough) % raw will contain the first ten lines, pos is the current position in the file
[raw, pos] = textscan(fid, '%[^\n]',10, 'delimiter', ',', 'BufSize',100000);
% now you can use something like this to read in the first three columns % change the order of the %f %f %s to match your data types
data = textscan(fid, '%f %f %s %*[^\n]', 'delimiter', ',', 'BufSize',100000);
% close the file fclose(fid)
data should now contain the first three columns of your data.
Hopefully I have that correct!! No doubt there is a quicker way to do this.
类别
在 帮助中心 和 File Exchange 中查找有关 Large Files and Big Data 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!