reading a text file by correct date format

Question

Damith 2015-12-31

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/262332-reading-a-text-file-by-correct-date-format

评论： Damith 2016-1-5

Hi,

I have a text file like below. How can I read the 2nd column/format to read it using MATLAB. I am having a difficulty is reading it using the correct format for yyyy/mm/dd hh:mm. Any help is appreciated.

Thanks in advance.

filePattern = fullfile(myFolder, '*.txt');
csvFiles = dir(filePattern);
fmt='%d %4d/%2d/%2d %4d %*[^\n]';
for i=1:length(csvFiles)
  fid = fopen(fullfile(myFolder,csvFiles(i).name));
  c=cell2mat(textscan(fid,fmt,'headerlines',18,'collectoutput',1,'delimiter','\t'));
  fid=fclose(fid);
end

4 个评论
显示 2更早的评论隐藏 2更早的评论

dpb 2016-1-1

Again, we can't test your file unless you attach a section of it but as I said earlier, forget setting the 'delimiter' field entirely; let it default. Failing that, again, give us the actual data, not a picture of it.

Damith 2016-1-2

10802.zip

I let it defaulted but failed again. Results still the same. See the attached data.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

dpb 2016-1-2

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/262332-reading-a-text-file-by-correct-date-format#answer_204860

在 MATLAB Online 中打开

OK, as I suspected, the file format is NOT tab delimited but fixed-width columns except the records aren't the same length; the record is terminated after the last data in the record. So, the lines are as follows:

'010802         2015/01/01 00:00                   AR'
'010802         2015/01/01 00:15                   AR'
...
'010802         2015/04/20 00:00    15.20         '
'010802         2015/04/20 00:15    15.20         '
...

where I've enclosed the two record types in single quotes to be able to see what's actually in the file. Since C (and hence Matlab) formatted input ignores whitespace excepting if you read counted characters (as in '%Nc'), you can't write a single format string to parse both data records at the same time for the whole file.

It is possible, however, to read a record at a time with the same format string w/ textscan as it will pick up after the error of the failed conversion for an empty field--I created a very short version of your file without the preamble other than the header line and a few of each record type for demonstration purposes:

>> fmt=['%d %4d/%2d/%2d %2d:%2d %f%*[^\n]'];
>> fid=fopen('test.txt');
>> fgetl(fid);   % throw away the header line
>> while ~feof(fid)  % read record at a time, echo to terminal
    textscan(fid,fmt,1,'headerlines',1)
   end
ans = 
  [10802]    [2015]    [4]    [19]    [22]    [45]    [0x1 double]
ans = 
  [10802]    [2015]    [4]    [19]    [23]    [0]    [0x1 double]
ans = 
  [10802]    [2015]    [4]    [19]    [23]    [15]    [0x1 double]
...
  [10802]    [2015]    [4]    [20]    [0]    [0]    [15.2000]
ans = 
  [10802]    [2015]    [4]    [20]    [0]    [15]    [15.2000]
ans = 
  [10802]    [2015]    [4]    [20]    [0]    [30]    [15.2000]
ans = 
...
ans = 
  [10802]    [2015]    [5]    [1]    [1]    [30]    [180.3000]
ans = 
  [10802]    [2015]    [11]    [1]    [12]    [15]    [35.7300]
ans = 
  [10802]    [2015]    [11]    [1]    [12]    [30]    [35.7300]
ans = 
  [10802]    [2015]    [11]    [1]    [12]    [45]    [35.7300]
...
ans = 
  [10802]    [2015]    [11]    [29]    [23]    [45]    [67.1200]
ans = 
  [10802]    [2015]    [11]    [30]    [0]    [0]    [0x1 double]
ans = 
  [10802]    [2015]    [11]    [30]    [0]    [15]    [0x1 double]
ans = 
  [10802]    [2015]    [11]    [30]    [0]    [30]    [0x1 double]
...
ans = 
  [10802]    [2015]    [12]    [11]    [10]    [45]    [0x1 double]
ans = 
Columns 1 through 6
  [0x1 int32]    [0x1 int32]    [0x1 int32]    [0x1 int32]    [0x1 int32]    [0x1 int32]
Column 7
  [0x1 double]
>> fid=fclose(fid);

The last record is the EOF case...

Alternatively, you can

read the file into a cellstring array,
convert to character array which will pad the short records, 
convert the fixed width substring fields in memory.

Or, an even better choice, avoid all this hassle and create a delimited, regular file format that can be parsed easily.

Here's another case where Fortran FORMAT wins, hands down--it would read the empty data field although would need fixed record length file.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Damith 2016-1-3

Thanks.

请先登录，再进行评论。

Answer 2

per isakson 2016-1-2

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/262332-reading-a-text-file-by-correct-date-format#answer_204892

编辑：per isakson 2016-1-2

在 MATLAB Online 中打开

It's a challenge to read files like yours with Matlab.

>> clear
>> [c1,sdn,c3,c4] = cssm( '010802_Q_1997.txt' );
>> whos
  Name          Size              Bytes  Class     Attributes
    c1        20256x1               81024  int32               
    c3        20256x1              162048  double              
    c4        20256x1             2286052  cell                
    sdn       20256x1              162048  double

where

function    [c1,sdn,c3,c4] = cssm( filespec )
fmt = '%6c%25c%9c%[^\n]';
fid = fopen( filespec );
cac = textscan( fid, fmt, 'Headerlines',18, 'Whitespace','' );
fclose( fid );
c1  = textscan( cac{1}', '%6d' );
c1  = c1{:};
sdn = datenum( cac{2}, 'yyyy/mm/dd HH:MM' );
str = permute( cac{3}, [2,1] );
ise = arrayfun( @(ix) all(isspace(str(:,ix))), (1:length(str)) );
str( 7:9, ise ) = repmat( permute( 'nan', [2,1] ), 1, sum(ise) );
c3  = textscan( str, '%9f' );
c3  = c3{:};
c4  = strtrim(cac{4});
end

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Damith 2016-1-5

在 MATLAB Online 中打开

Apologies for the late reply. This seems serving the purpose. Can you briefly explain what these 3 lines does? Thanks a lot.

str = permute( cac{3}, [2,1] );
ise = arrayfun( @(ix) all(isspace(str(:,ix))), (1:length(str)) );
str( 7:9, ise ) = repmat( permute( 'nan', [2,1] ), 1, sum(ise) );

请先登录，再进行评论。

reading a text file by correct date format

4 个评论
显示 2更早的评论隐藏 2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

reading a text file by correct date format

4 个评论 显示 2更早的评论隐藏 2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

4 个评论
显示 2更早的评论隐藏 2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论