Read a text file with varying number of colums

Question

Pankaj 2015-1-11

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/169557-read-a-text-file-with-varying-number-of-colums

评论： Pankaj 2015-1-12

I am trying to read a text file with varying number of columns, such as this:

    #5:v3.0_ST:1:2631.1301,N:140.0:081.000:12.5;
    #5:v3.0_ST:1:2631.1301,N:111.4:100.000:12.5:18.7:32.3;
    #5:v3.0_ST:1:2631.1299,N:111.5:101.000:12.5:18.7:32.3;
    #5:v3.0_ST:1:2631.1315,N:136.4:082.000:12.3;
    #5:v3.0_ST:1:2631.1334,N:132.8:083.000:12.4;

The data is delimited by " : " (colon). I understand, there is some way of doing this using textscan, but I do not know how to do it for varying columns. Can someone give me a hint?

Thanks

4 个评论
显示 2更早的评论隐藏 2更早的评论

Pankaj 2015-1-12

ohh, the comma is for separating coordinate notation(N for North), longitude coordinates were also there. I reduced it to simplified form.

Pankaj 2015-1-12

Thank you all for giving your time for this question.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

per isakson 2015-1-12

2
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/169557-read-a-text-file-with-varying-number-of-colums#answer_164626

编辑：per isakson 2015-1-12

在 MATLAB Online 中打开

Try

    fid = fopen( 'cssm.txt' );
    cac = textscan( fid, '%s%s%s%s%s%s%s%s%s', 'CollectOutput'  ...
                ,   true, 'Delimiter', ':;'  );
    [~] = fclose( fid );

it returns with R2013a

>> cac{:}
ans = 
  Columns 1 through 8
  '#5'  'v3.0_ST'  '1'  '2631.1301,N'  '140.0'   '081.000'    '12.5'    ''    
  '#5'  'v3.0_ST'  '1'  '2631.1301,N'  '111.4'   '100.000'    '12.5'    '18.7'
  '#5'  'v3.0_ST'  '1'  '2631.1299,N'  '111.5'   '101.000'    '12.5'    '18.7'
  '#5'  'v3.0_ST'  '1'  '2631.1315,N'  '136.4'   '082.000'    '12.3'    ''    
  '#5'  'v3.0_ST'  '1'  '2631.1334,N'  '132.8'   '083.000'    '12.4'        []
  Column 9
    ''    
    '32.3'
    '32.3'
    ''    
        []
>>

&nbsp

Comments:

importdata reads and parses this file in R2013a too. However, textscan is significantly faster.
the empty, "[]", at the right bottom corner must be handled separately
the format string should be modified to account for the numberical columns
I don't think textscan would have behaved this nice some years ago.

&nbsp

Addendum

This might better match what you look for

    fid = fopen( 'cssm.txt' );
    cac = textscan( fid, '%s%s%f%f%s%f%f%f%f%f' ...
                ,   'CollectOutput' ,   true    ...
                ,   'Delimiter'     , ':;,'     );
    [~] = fclose( fid );

Output:

    >> cac{:}
    ans = 
        '#5'    'v3.0_ST'
        '#5'    'v3.0_ST'
        '#5'    'v3.0_ST'
        '#5'    'v3.0_ST'
        '#5'    'v3.0_ST'
    ans =
       1.0e+03 *
        0.0010    2.6311
        0.0010    2.6311
        0.0010    2.6311
        0.0010    2.6311
        0.0010    2.6311
    ans = 
        'N'
        'N'
        'N'
        'N'
        'N'
    ans =
      140.0000   81.0000   12.5000       NaN       NaN
      111.4000  100.0000   12.5000   18.7000   32.3000
      111.5000  101.0000   12.5000   18.7000   32.3000
      136.4000   82.0000   12.3000       NaN       NaN
      132.8000   83.0000   12.4000       NaN       NaN
    >>