Skip until import data

Question

0 个投票

I have some questions about importing data. Here is an example of the data file to import:

!!!!!!
! text text
! stuff
0.1      2.53  2.5
0.2  2.59  2.43
0.3  2.5  2.54
0.4  2.48  2.53
0.5  2.52  2.48
1
ABC 0.123 123 
   DE
    0.456 0.456 456
0.1  2.56  2.34  2.63
0.2  2.61  2.48  2.43
0.3  2.54  2.51  2.6
0.4  2.57  2.54  2.49
0.5  2.48  2.63  2.5

Here is the code I'm using to import this data:

Test=fopen('TestData.txt'); % open the file
for n=1
mystruct(n).Header1 = fgetl(Test); %line1 goes to header1
fgetl(Test); %skip line
mystruct(n).Header2 = fgetl(Test);
fgetl(Test);
mystruct(n).Header3 = fgetl(Test);
mystruct(n).meas = fscanf(Test, '%f', [3, 5])';
end
for n=2
      for j=1:6  % skips to the 6th line
          fgetl(Test);
      end
  mystruct(n).T = fscanf(Test, '%f', 1); % call out value for T
      for j=1:2  % skips 2 empty lines
          fgetl(Test);
      end
  mystruct(n).meas = fscanf(Test, '%f', [4, 5])';
  end
fclose(Test); % Close the file

I want to preserve the headers at the top and I don't necessarily care about the midfile headers with the exception of my T-value. My question is how I can import this to allow for variable amounts of headers at the top and in the middle of the file without having to look through each data file? This would be helpful since I have multiple data files and with varying contents (mainly the headers). I think I need something like skip until that includes skipping empty spaces and allows for individual treatment of the matrices as I have it now. Any help is much appreciated thanks!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

per isakson 2012-8-9

编辑：per isakson 2012-8-12

在 MATLAB Online 中打开

0 个投票

I have deleted a sketchy outline, which was not helpful.

--- Working code ---

Purpose:

learn to use Matlab

Approach:

the data file consists of consecutive blocks of headers and data
a data block is a number of consecutive rows containing an equal number of "numerical strings"
a header block is a number of consecutive rows, which do not belong to a data block

Implementation:

cssm, main function
getblocks, subfunction

Hopefully, the code works with more data files than the example above, cssm.txt

Example:

>> [ header_blocks, data_blocks ] = cssm()
header_blocks = 
    {6x1 cell}
    {4x1 cell}
data_blocks = 
    [5x3 double]
    [5x4 double]

.

Left as excersice:

understand the code
write comments

====

    function [ header_blocks, data_blocks ] = cssm()
        fid = fopen( 'cssm.txt' );
        cac = textscan( fid, '%s', 'Whitespace','', 'Delimiter','\n' );
        fclose( fid );
        number_of_floats = cellfun( @(c) size(c,2)          ...
            ,   regexp( cac{:}, '[+|-]?\d*\.\d+', 'match' ) ...
            ,   'uni', true                                 );
        number_of_stuff  = cellfun( @(c) size(c,2)                  ...
            ,   regexp( cac{:}, '[^([+|-]?\d*\.\d+) ]', 'match' )   ...
            ,   'uni', true                                         );
        is_data = ( number_of_floats >= 1 & number_of_stuff == 0 );
        number_of_data_columns = number_of_floats;
        number_of_data_columns( not(is_data) ) = nan;
        [ ~, ix1, ix2 ] = getblocks( number_of_data_columns, 2 );
        data_blocks = cell(0);
        for ii = 1 : numel( ix1 )
            data_blocks = cat( 1, data_blocks               ...
                ,   {str2num( char(cac{1}{ix1(ii):ix2(ii)}))} );
        end
        ix3 = cat( 2, 1, ix2+1 );
        ix4 = cat( 2, ix1-1, size( cac{1}, 1 ) );
        header_blocks = cell(0);
        for ii = 1 : numel( ix1 )
            header_blocks = cat( 1, header_blocks       ...
                            ,   {cac{:}(ix3(ii):ix4(ii))} );
        end
    end

====

    function  [ col, ix1, ix2 ] = getblocks( sequence, min_nrows )
    %   without comments
        seq     = cat( 2, nan, transpose( sequence(:) ), nan );
        change  = diff( double( diff( seq ) == 0 ) );
        ix1     = strfind( change, +1 );
        ix2     = strfind( change, -1 );    
        col     = sequence( ix1 );
        if min_nrows >= 2
            isg = ix2-ix1+1 >= min_nrows;
            col = col( isg );
            ix1 = ix1( isg );
            ix2 = ix2( isg );
        else
            ix_sngl = find( not( logical( cumsum( change )          ...
                                        + double( change==-1 ) ) ) );
            ix1     = cat( 2, ix1, ix_sngl );
            ix2     = cat( 2, ix2, ix_sngl );
            col     = cat( 2, col, sequence( ix_sngl ) );
            [~,ix]  = sort( ix1 );
            ix1     = ix1( ix );
            ix2     = ix2( ix );
            col     = col( ix );
        end
    end

15 个评论
显示 13更早的评论隐藏 13更早的评论

vb 2012-8-14

i think i have something that works! Thank you! I had to comment number_of_data ( not(is_data) ) = nan; and data_blocks returned the two matrices. Not totatlly sure where the discrepancy is since it worked as is for you. Thanks again!

per isakson 2012-8-14

编辑：per isakson 2012-8-14

在 MATLAB Online 中打开

I cannot guess what problems you see. However, here is what i get when I run the code above:

with "% number_of_data ( not(is_data) ) = nan;" commented out

>> [ header_blocks, data_blocks ] = cssm()
header_blocks = 
    {0x1 cell}
    {0x1 cell}
    {4x1 cell}
data_blocks = 
    []
    [5x3 double]
    [5x4 double]
>> header_blocks{:}
ans = 
   Empty cell array: 0-by-1
ans = 
   Empty cell array: 0-by-1
ans = 
    '1'
    'ABC 0.123 123'
    ' DE'
    '  0.456 0.456 456'
>> data_blocks{:}
ans =
     []
ans =
    0.1000    2.5300    2.5000
    0.2000    2.5900    2.4300
    0.3000    2.5000    2.5400
    0.4000    2.4800    2.5300
    0.5000    2.5200    2.4800
ans =
    0.1000    2.5600    2.3400    2.6300
    0.2000    2.6100    2.4800    2.4300
    0.3000    2.5400    2.5100    2.6000
    0.4000    2.5700    2.5400    2.4900
    0.5000    2.4800    2.6300    2.5000
>>

.

with "number_of_data ( not(is_data) ) = nan;" in place

>> [ header_blocks, data_blocks ] = cssm()
header_blocks = 
    {6x1 cell}
    {4x1 cell}
data_blocks = 
    [5x3 double]
    [5x4 double]
>> header_blocks{:}
ans = 
    '!!!!!!'
    ''
    '! text text'
    ''
    '! stuff'
    ''
ans = 
    '1'
    'ABC 0.123 123'
    ' DE'
    '  0.456 0.456 456'
>> data_blocks{:}
ans =
    0.1000    2.5300    2.5000
    0.2000    2.5900    2.4300
    0.3000    2.5000    2.5400
    0.4000    2.4800    2.5300
    0.5000    2.5200    2.4800
ans =
    0.1000    2.5600    2.3400    2.6300
    0.2000    2.6100    2.4800    2.4300
    0.3000    2.5400    2.5100    2.6000
    0.4000    2.5700    2.5400    2.4900
    0.5000    2.4800    2.6300    2.5000
>>

.

Comment

In the text file there should be an empty line between "ABC..." and " DE". Adding that blank line doesn't cause any problems. I get

...
ans = 
    '1'
    'ABC 0.123 123'
    ''
    ' DE'
    '  0.456 0.456 456'

请先登录，再进行评论。

Skip until import data

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

15 个评论
显示 13更早的评论隐藏 13更早的评论

更多回答（0 个）

类别

标签

Community Treasure Hunt

Skip until import data

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

15 个评论 显示 13更早的评论 隐藏 13更早的评论

更多回答（0 个）

类别

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

15 个评论
显示 13更早的评论隐藏 13更早的评论