different number of delimiters error using readtable function
14 次查看(过去 30 天)
显示 更早的评论
I use the following code for reading text file:
fileID = fopen(full_file_name);
fclose(fileID);
tCOD=readtable(full_file_name,'FileType','text', ...
'headerlines',25,'readvariablenames',0,'MultipleDelimsAsOne', true);
The above codes work for most of the text files I read. I attached one of them (data_without_problem.txt). But some text files, I receive the following error:
Error using readtable (line 216)
Reading failed at line 121. All lines of a text file must have the same number of delimiters. Line 121 has 6 delimiters, while
preceding lines have 5.
Note: readtable detected the following parameters:
'Delimiter', '\t ', 'MultipleDelimsAsOne', true, 'Format', '%q%f%f%f%f%f'
I attached this kind of text file (data_with_problem.txt).
How I can modify the above readtable function for working with text files that different number of delimiters in all lines?
My Matlab version is 2019a.
2 个评论
dpb
2021-10-15
#dP2019 9 7 0 0 0.00000000 576 u+U IGS14 FIT GFZ
## 2069 518400.00000000 300.00000000 58733 0.0000000000000
+ 95 C01C02C03C04C05C06C07C08C09C10C11C12C13C14C16E01E02
+ E03E04E05E07E08E09E11E12E13E14E15E18E19E21E24E25E26
+ E27E30E31E33E36G01G02G03G04G05G06G07G08G09G10G11G12
+ G13G14G15G16G17G18G19G20G21G22G23G24G25G26G27G28G29
+ G30G31G32J02J03J07R01R02R03R05R07R08R09R11R12R13R14
+ R15R16R17R18R19R20R21R22R23R24 00 00 00 00 00 00 00
++ 10 10 10 10 10 6 8 6 6 8 10 8 8 6 6 6 6
++ 6 6 8 6 6 6 6 6 6 6 6 6 6 6 6 6 6
++ 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
++ 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
++ 6 6 6 6 6 8 10 6 8 8 8 8 6 6 6 6 6
++ 6 6 6 6 8 8 6 6 6 6 0 0 0 0 0 0 0
%c M cc GPS ccc cccc cccc cccc cccc ccccc ccccc ccccc ccccc
%c cc cc ccc ccc cccc cccc cccc cccc ccccc ccccc ccccc ccccc
%f 1.2500000 1.025000000 0.00000000000 0.000000000000000
%f 0.0000000 0.000000000 0.00000000000 0.000000000000000
%i 0 0 0 0 0 0 0 0 0
%i 0 0 0 0 0 0 0 0 0
/* PCV:IGS14_2062 OL/AL:FES2004 NONE YN CLK:CoN ORB:CoN
/* GeoForschungsZentrum Potsdam
/*
/*
* 2019 9 7 0 0 0.00000000
PC01 -32247.666769 27128.253711 852.734449 -142.854736
PC02 4291.889841 41959.462941 -227.014442 886.812431
PC03 -14756.270737 39468.969011 529.367042 -15.526662
PC04 -39608.430012 14398.971601 684.035369 -18.784543
...
is the beginning of the so-called "problem" file -- what do expect to be able to read from it?
It clearly has header information and different kinds of data in it; a "one size fits all" solution is unlikely to be possible unless you can just skip the header and read the regular data after the header information.
采纳的回答
dpb
2021-10-15
编辑:dpb
2021-10-16
Use import options object -- although to write a fully generic import code you'll have to scan the file to find the number of header lines for each file as detectImportOptions isn't clever enough to know what you intend about the header data on its own...
I used the explicit number of header lines here
optW=detectImportOptions('data_with_problem.txt','NumHeaderLines',25,"CommentStyle",'*','ReadVariableNames',0);
optW.MissingRule='omitrow';
optW.SelectedVariableNames=opt.SelectedVariableNames(1:5);
tDW=readtable('data_with_problem.txt',optW);
This produces a file whos head and tail look like--
>> [head(tDW);tail(tDW)]
ans =
16×5 table
Var1 Var2 Var3 Var4 Var5
________ _______ _______ _______ _______
{'PC01'} -32248 27128 852.73 -142.85
{'PC02'} 4291.9 41959 -227.01 886.81
{'PC03'} -14756 39469 529.37 -15.527
{'PC04'} -39608 14399 684.04 -18.785
{'PC05'} 21860 36038 -443.83 39.711
{'PC06'} -13165 21105 -33718 427.66
{'PC07'} -22236 33734 11284 -113.86
{'PC08'} -7668.8 34363 23503 -2.0925
{'PR17'} -10797 3515.3 22840 258.4
{'PR18'} 2372.7 16249 19510 6.6669
{'PR19'} 13760 20695 5772.9 -52.585
{'PR20'} 17905 12385 -13261 -389.79
{'PR21'} 10304 -4036.6 -22990 -70.999
{'PR22'} -4738.6 -17784 -17720 -36.732
{'PR23'} -16617 -19348 7.969 252.43
{'PR24'} -17845 -10556 14834 -184.49
>>
>> whos tDW
Name Size Bytes Class Attributes
tDW 380x5 56568 table
>>
The same logic will work for the files without the trailing 'P' in the records; the key is to tell it to only import the field name and the four numeric variables.
That assumes you don't need those based on your above description. If you do need them, then use
optW.ExtraColumnsRule='addvars';
and don't limit the number of SelectedVariables size.
With the variable number of header lines determined externally first, the above will work for either file; you'll note I used the 'CommentStyle','*' to get rid of the date stamp rows; if you want to keep those to parse them separately, then remove that. By using it, readtable is not flexible enough to have more than one comment character so I used the 'omitrow' for 'Missing' to eliminate the last EOF record. If you keep the commented time fields, then you could set the comment character to 'E' for that purpose instead.
ADDENDUM:
A little routine to return the number of header lines could look something like --
function nHdr=getNumHeaderLines(file)
fid=fopen(file);
nHdr=1;
while ~startsWith(fgetl(fid),'* ')
nHdr=nHdr+1;
end
fid=fclose(fid);
end
The above logic at the command for the problem data file returns--
>> fid=fopen('data_with_problem.txt');
>> nHdr=1;
>> while ~startsWith(fgetl(fid),'* '),nHdr=nHdr+1;end
>> nHdr
nHdr =
25
>> fid=fclose(fid);
to illustrate it returns the value you want/need...
更多回答(0 个)
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!