Is it possible to change customization of textscans when importing data from files, in-line?

Question

Tolulope 2012-9-3

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/47287-is-it-possible-to-change-customization-of-textscans-when-importing-data-from-files-in-line

If I want to import the lines from a data file (shown below) with a custom delimiter - then I want to change the delimiter in-line. I'm textscanning using something like this:

fid = fopen('Test2.txt','r');
  H = textscan(fid, '%s',21,'delimiter','=');
  F = textscan(fid, '%f %f %f %f %f %f %f');
fclose(fid);

The problem is I would like to read in the numbers under Parameter 1 and Parameter 2 as datasets/arrays. I've tried calling them from the cell array, but the result comes out as an array of characters. Is there a way of getting those parameters out as normal arrays using textscan preferences or otherwise?

 DATA
        Name=Datablock 1
        Date=12:02 03/09/2012
        Parameter 1=32, 346, 634, 5467, 4567
        Parameter 2=6.53; 7.53; 7.67; 9.01; 10.67
        Offset=0
        Configuration=10
        Noise=0.1
        Reference number=14546757
        Version number=WERGXX1.0a
        Alias=False
        EOH
        0 12341 12341234 34 7 8 446
        0 12341 12341234 34 7 8 446
        0 12341 12341234 34 7 8 446
        0 12341 12341234 34 7 8 446

4 个评论
显示 2更早的评论隐藏 2更早的评论

Jan 2012-9-3

@Tolulope: It is not clear what you want to import in which format. Please post, what you expect as output for the given text file.

Tolulope 2012-9-4

编辑：Tolulope 2012-9-4

在 MATLAB Online 中打开

@Jan trying to answer your question helped me realize the solution to my problem :) I had been trying to scan whole lines of text as one format only. So using the code I gave above, the first 3 lines are...

 Name=Datablock 1
 Date=12:02 03/09/2012
 Parameter 1=32, 346, 634, 5467, 4567

...6 stings separated by " = ". But this converts {32, 346, 634, 5467, 4567} into a character array. I hadn't realized I can simply treat the first three lines as: 5 strings delimited by " = "; 5 numbers delimited by " , "; then one string; then 5 numbers separated by " ; " and so on... I'm now using,

fid = fopen('Test2.txt','r');
  H = textscan(fid, '%s',5,'delimiter','=');
  F = textscan(fid, '%f %f %f %f %f','delimiter',',');
  % and so on ...
fclose(fid);

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

per isakson 2012-9-4

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/47287-is-it-possible-to-change-customization-of-textscans-when-importing-data-from-files-in-line#answer_57789

编辑：per isakson 2012-9-4

在 MATLAB Online 中打开

If the file fits in memory this is one way to do it.

I decided in some cases what is best for OP;-) and guessed that "EOH" stands for end of header. I missed the meaning of: "... I want to change the delimiter in-line."

Some reasons I do it this way:

The code may be developed one cell at a time in debug mode. I use the "Evaluate cell" button in the toolbar. I check intermediate results before proceeding to next cell.
Assigning the result to a structure makes it easy to add more fields. Name and Date are two good candidates for new fields.
With str2num the code does not depend on the number of columns of data.
Structures are easy to make somewhat self-documented
It is easy to insert a new cell, in which problems are fixed with some find&replace, e.g. convert decimal separator, ',', to '.' and change list delimiters so that str2num can handle the strings.

I often regret that I did not make the code more robust to small changes in the data file.

Test with the small data sample:

    >> S = cssm()
    S = 
        Parameter_1: [32 346 634 5467 4567]
        Parameter_2: [5x1 double]
         Data_block: [4x7 double]

where cssm.m contains

    function S = cssm()
        fid = fopen( 'cssm.txt', 'r' );
        cac = textscan( fid, '%s', 'Delimiter','\n' );
        fclose( fid );   
        cac = strtrim( cac{:} );
        ixe = find( strcmp( cac, 'EOH' ) );
        cah = cac( 1 : ixe );
        cad = cac( ixe+1 : end );
        is1 = strncmp( cah, 'Parameter 1', 11 );        
        is2 = strncmp( cah, 'Parameter 2', 11 );
        assert( sum(double(is1))==1     ...
            ,   'cssm:IllegalNumber'    ...
            ,   'No parameter 1 "%f"'   ...   
            ,   sum(double(is1))        )
        assert( sum(double(is2))==1     ...
            ,   'cssm:IllegalNumber'    ...
            ,   'No parameter 2 "%f"'   ...   
            ,   sum(double(is2))        )
        buf = regexp( cah{is1}, '=', 'split' );
        S.Parameter_1 = str2num( buf{2} );
        buf = regexp( cah{is2}, '=', 'split' );
        S.Parameter_2 = str2num( buf{2} );
        S.Data_block = str2num( char( cad ) );
    end

and where cssm.txt contains

    Name=Datablock 1
    Date=12:02 03/09/2012
    Parameter 1=32, 346, 634, 5467, 4567
    Parameter 2=6.53; 7.53; 7.67; 9.01; 10.67
    Offset=0
    Configuration=10
    Noise=0.1
    Reference number=14546757
    Version number=WERGXX1.0a
    Alias=False
    EOH
    0 12341 12341234 34 7 8 446
    0 12341 12341234 34 7 8 446
    0 12341 12341234 34 7 8 446
    0 12341 12341234 34 7 8 446

.

Next task is to profile the code with real data files:

str2num might not be the fastest way to convert to double.
Splitting the content of the file in header, cah, and data, cad, is a bit of a waste of memory and cpu if the data files are large.

2 个评论
显示无隐藏无

Tolulope 2012-9-4

在 MATLAB Online 中打开

Yes quite right. Even the code I have is intolerant to any change in the original format. What I eventually did was by brute force: Read every unique recurrence independently. So I have like 10 lines of "textscan"! At least that way I can control how MatLab reads the variables in. Although, I hadn't realized one could use,

S.Data_block = str2num( char( cad ) );

I thought up till now, once a variable becomes a character it can't be turned back into a number of any kind? Thanks for the feedback. Very handy.

per isakson 2012-9-4

在 MATLAB Online 中打开

There are several different functions, which convert from string to numeric, e.g.

    C = textscan( str, ... )
    A = sscanf( str, format, sizeA )
    X = str2double('str')

请先登录，再进行评论。

Is it possible to change customization of textscans when importing data from files, in-line?

4 个评论
显示 2更早的评论隐藏 2更早的评论

采纳的回答

2 个评论
显示无隐藏无

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

Is it possible to change customization of textscans when importing data from files, in-line?

4 个评论 显示 2更早的评论隐藏 2更早的评论

采纳的回答

2 个评论 显示 无隐藏 无

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

4 个评论
显示 2更早的评论隐藏 2更早的评论

2 个评论
显示无隐藏无