textscan does not read all rows

8 次查看(过去 30 天)
Hi,
I am dealing with very large .txt files and trying to use textscan to open them. I have a a smaller .txt file with the same format that I was able to open with readtable. The resulting table has 47 variables and 1389712 rows.
Here is readtable code:
data=readtable('Building.txt');
Here is the textscan code:
formatSpec='%s%f%s%s%s%s%f%f%f%f%f%s%s%f%f%f%f%s%f%f%f%f%f%f%f%f%f%s%f%s%s%s%s%s%s%s%f%f%s%s%s%f%s%f%s%f%f';
fid = fopen('Building.txt','r');
data1 = textscan(fid,formatSpec,'Delimiter','|');
fclose(fid);
data1 has 47 variables, but only 36299 rows instead of 1389712 rows. I would use readtable, but it is way too slow for the large txt.files.
Please note that the formatSpec is obtained from the resulting readtable data by using summary(data) I could see the format of each variable.
This is an example of the format of the text files I am trying to use (lots of missing data I know):
EE760424-42D5-E511-80C1-3863BB43AC67|0||RESIDENTIAL STRUCTURE||RR000|||1||||||||| |0|0||0||.00||.00||C|0|||||||||||||||99748186| |38001|7017
EF760424-42D5-E511-80C1-3863BB43AC67|0||RESIDENTIAL STRUCTURE||RR000|||1||||||||| |0|0||0||.00||.00||C|0|||||||||||||||99748257| |38001|7017
Thanks a lot!
  1 个评论
dpb
dpb 2020-1-19
You sure there aren't missing values in the readtable table? It's much more forgiving of a bad format or missing data than is textscan
Not much think anybody can do here without a sample file to work on...it should zip up pretty compactly.

请先登录,再进行评论。

回答(1 个)

Jeremy Hughes
Jeremy Hughes 2020-1-20
编辑:Jeremy Hughes 2020-1-20
If you pass in 'ReturnOnError',false with the textscan call, there will be an error message where the format cannot read your file. That's likely due to the missing data.
readtable tries to read using a detected format, and when that fails updates to re-read with a new format. It may be slow because it's reading multiple times trying to get the format correct. You could pass that same formatSpec into readtable, but it will likely error in the same way as textscan (just not silently)
If you try detectImportOptions with the file, then readtable, you might have faster/better results.
opts = detectImportOptions(file,'Delimiter','|','ExpectedNumVariables',47)
%% Check if this looks right
tp = preview(file,opts)
%% If the variable types look correct in tp, you don't need this step.
formatSpec='%s%f%s%s%s%s%f%f%f%f%f%s%s%f%f%f%f%s%f%f%f%f%f%f%f%f%f%s%f%s%s%s%s%s%s%s%f%f%s%s%s%f%s%f%s%f%f';
fmt = split(formatSpec(2:end),'%');
opts = setvartype(opts,strcmp(fmt,'f'),'double');
opts = setvartype(opts,strcmp(fmt,'s'),'char');
%% Read the whole file.
T = readtable(file,opts);
I can't really test this without your file, but it should work (maybe with some tweaking)

类别

Help CenterFile Exchange 中查找有关 Data Import and Export 的更多信息

标签

产品


版本

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by