How to read a comma delimited .dat file containing mixed formats

Question

Camille Couzi 2018-10-2

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/421718-how-to-read-a-comma-delimited-dat-file-containing-mixed-formats

评论： dpb 2018-10-3

Hi, I have been struggling with this for hours: I have a .dat file containing comma separated values. this is how the first lines look like:

Cell 1,13,3-22-2018 05:30,-0.02496409
Cell 1,14,3-22-2018 05:35,-0.02496409
Cell 1,15,3-22-2018 05:40,-0.02496409
Cell 1,16,3-22-2018 05:45,-0.02496409
Cell 1,17,3-22-2018 05:50,-0.02496409

My file has more than 1,100,000 lines so I know I have to use textscan. I have tried textread,dlmread,importdata,readtable... textscan is my best bet, but I can't figure out the format.

I would like to end up with

CellNb = vector containing all my cell numbers (in this example it'll be 1 1 1 1 1)
EnsNb = vector containing all my ensemble numbers (in this example it'll be 13 14 15 16 17)
Date = vector containing my dates + times (datenum format)
Val = containing the values on the last column.

So far I am trying:

filename='SimplifiedFile_commas.dat';
fid = fopen(filename);
C=textscan(fid,'%s%d%s%f64\n','delimiter',',');
fclose(fid);
celldisp(C)
which clearly doesn't work because i get:
C{1}{1} =
ÿþC e l l   1 
C{2} =
       []
C{3} =
       {}
C{4} =
       []

Could anyone give me a hint on the format I have to use? Thanks!!!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

dpb 2018-10-2

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/421718-how-to-read-a-comma-delimited-dat-file-containing-mixed-formats#answer_339313

编辑：dpb 2018-10-3

在 MATLAB Online 中打开

Just read the file as it's given then parse the data columns...

t=readtable('couzi.csv','ReadVariableNames',0,'Delimiter',',');  % read base file
t.Var1=str2num(char(strrep(t.Var1,'Cell ','')));                 % parse cell number
t.Var3=datetime(t.Var3,'Format','M-dd-yyyy HH:mm');              % convert to datetime
t.Properties.VariableNames={'Cell','Ens','Date','Val'}           % set desired variable names
t =
5×4 table
  Cell    Ens         Date             Val   
  ____    ___    _______________    _________
  1       13     3-22-2018 05:30    -0.024964
  1       14     3-22-2018 05:35    -0.024964
  1       15     3-22-2018 05:40    -0.024964
  1       16     3-22-2018 05:45    -0.024964
  1       17     3-22-2018 05:50    -0.024964
>>

ADDENDUM

As pedagogical note; a little more obtuse but perhaps "more MATLABy" solution to the first column could be

t.Var1=cellfun(@(s) sscanf(s,'Cell%f'),t.Var1);

to read and convert the actual cell content rather than strip/replace. Not sure if there would be any noticeable performance difference for large files or not.

Which leads to the alternate original solution to parse the file directly-- textscan can be used but readtable is more flexible in many ways altho for a very large file the resulting table may be somewhat slow in performance.

fmt='Cell%f%f%q%f';
c=textscan(fid,fmt,'delimiter',',','collectoutput',0);

will return

c =
1×4 cell array
  {5×1 double}    {5×1 double}    {5×1 cell}    {5×1 double}

where the time string is a cell array to convert to datetime.

Unfortunately, the '%{}D' time format scanning routine is broken for embedded space in the string between the date and time fields as is so common; the obvious

>> textscan(c{3},'%{M-d-yyyy HH:mm}D')
Error using textscan
Unable to read the DATETIME data with the format 'M-d-yyyy HH:mm'. If the data is not a time, use %q to get
text data.

fails miserably. One can workaround it with

>> t=textscan(strrep(c{3}(1),' ',''),'%{M-d-yyyyHH:mm}D')
t =
1×1 cell array
  {[3-22-201805:30]}

and end up with an ugly, illegible output format; this, too, is fixable:

>> t.Format='default'
t = 
datetime
 22-Mar-2018 05:30:00

4 个评论
显示 2更早的评论隐藏 2更早的评论

Walter Roberson 2018-10-3

See https://www.mathworks.com/matlabcentral/answers/421358-retrieve-data-containing-date-and-time-from-text-file#comment_615974 for more on when %D accepts spaces.

dpb 2018-10-3

AH! I had not thought of there being the interaction with 'whitespace', Walter.

The textscan doc says when sees %{}D format the field is scanned as '%q' but of course the given string is malformed for embedded blanks by not being surrounded by ".

I suppose it would be too much to ask for but I think (and believe most users would inherently expect that) textscan/%D should be smart enough to read the provided format string as given and parse the corresponding field as told, only reporting error or failing when the string doesn't actually match the data or the format string itself is malformed, embedded delimiter or no.

But, thanks for pointing out the workaround; I'll try to recall it next time it comes up... :)

PS. I don't see the ambiguity you point out in the last example there; if 'whitespace','' is given in conjunction with any format string conversion specifier, textscan does not add the superfluous blank it does with other delimiters. That example seems clear to me it should work unless I'm missing something.

请先登录，再进行评论。

How to read a comma delimited .dat file containing mixed formats

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

4 个评论
显示 2更早的评论隐藏 2更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

How to read a comma delimited .dat file containing mixed formats

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

4 个评论 显示 2更早的评论隐藏 2更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

4 个评论
显示 2更早的评论隐藏 2更早的评论