Convert chars into formatted numbers

Question

0 个投票

Hello everyone,

I am working on a code which parses a .header file to interpret a big database stored in a .data file (for those familiar, HITRAN).

From the header file I am able to obtain information on where to separate each line of the dataset into a variable and which format this variable is in. I will put below an example of data:

% ... Parse the .header file to get variable names (.Names) and their numerical
% formatting (.Values) in C. Some of them are double, some of them are integer numbers
% related to quantum states. Note that Names and Values are not in the same
% order as the columns of the .data file.
FormatBlock.Names = {'a', 'gamma_air', 'gp', 'local_iso_id', 'molec_id', 'sw', 'local_lower_quanta', 'local_upper_quanta', 'gpp', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'iref', 'line_mixing_flag', 'ierr', 'nu', 'gamma_self', 'global_lower_quanta'};
FormatBlock.Values = {'%10.3E', '%5.4f', '%7.1f', '%1d', '%2d', '%10.3E', '%15s', '%15s', '%7.1f', '%10.4f', '%4.2f', '%8.6f', '%15s', '%12s', '%1s', '%6s', '%12.6f', '%5.3f', '%15s'};
% ... Parse the .data file, dividing it into lines and separating values
% into columns keeping them as char. Here an example of one line
DataBlock.Names = {'molec_id', 'local_iso_id', 'nu', 'sw', 'a', 'gamma_air', 'gamma_self', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'global_lower_quanta', 'local_upper_quanta', 'local_lower_quanta', 'ierr', 'iref', 'line_mixing_flag', 'gp', 'gpp'}
DataBlock.Columns = ' 1', '1', ' 2800.033883', ' 1.303E-29', ' 1.003E-04', '.0664', '0.298', ' 2705.1396', '0.65', '0.005780', '          0 2 0', '          0 1 0', ' 11  6  5      ', ' 10  1 10      ', '434233', '807294713152', ' ', '   69.0', '   63.0'}.

The question is: assuming that I am able to reorganise Names and Values in the same order of the data file, how can I convert the DataBlocks.Columns chars into numbers following each FormatBlock.Values?

For example:

'molec_id' = ' 1' has formatting '%2d', hence: "molec_id" = 1

'local_lower_quanta' = ' 0 1 0' has formatting '%15s', hence 'local_lower_quanta' = [0 1 0]

'nu' = ' 2800.033883' has formatting '%12.6f', hence 'nu' = 2.800033883e3

etc...

Thank you in advace for your help!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Star Strider 2025-3-21

在 MATLAB Online 中打开

0 个投票

I am not certain what result you want.

Try something like this —

% ... Parse the .header file to get variable names (.Names) and their numerical
% formatting (.Values) in C. Some of them are double, some of them are integer numbers
% related to quantum states. Note that Names and Values are not in the same
% order as the columns of the .data file.
FormatBlock.Names = {'a', 'gamma_air', 'gp', 'local_iso_id', 'molec_id', 'sw', 'local_lower_quanta', 'local_upper_quanta', 'gpp', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'iref', 'line_mixing_flag', 'ierr', 'nu', 'gamma_self', 'global_lower_quanta'};
FormatBlock.Values = {'%10.3E', '%5.4f', '%7.1f', '%1d', '%2d', '%10.3E', '%15s', '%15s', '%7.1f', '%10.4f', '%4.2f', '%8.6f', '%15s', '%12s', '%1s', '%6s', '%12.6f', '%5.3f', '%15s'};
% ... Parse the .data file, dividing it into lines and separating values
% into columns keeping them as char. Here an example of one line
DataBlock.Names = {'molec_id', 'local_iso_id', 'nu', 'sw', 'a', 'gamma_air', 'gamma_self', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'global_lower_quanta', 'local_upper_quanta', 'local_lower_quanta', 'ierr', 'iref', 'line_mixing_flag', 'gp', 'gpp'}
DataBlock = struct with fields:
    Names: {1x19 cell}
DataBlock.Columns = {' 1', '1', ' 2800.033883', ' 1.303E-29', ' 1.003E-04', '.0664', '0.298', ' 2705.1396', '0.65', '0.005780', '          0 2 0', '          0 1 0', ' 11  6  5      ', ' 10  1 10      ', '434233', '807294713152', ' ', '   69.0', '   63.0'}
DataBlock = struct with fields:
      Names: {1x19 cell}
    Columns: {1x19 cell}
format shortG
DBC = cellfun(@(x)sscanf(x,'%g'),DataBlock.Columns,Unif=0);
disp(DBC)
  Columns 1 through 13

    {[1]}    {[1]}    {[2800]}    {[1.303e-29]}    {[0.0001003]}    {[0.0664]}    {[0.298]}    {[2705.1]}    {[0.65]}    {[0.00578]}    {3x1 double}    {3x1 double}    {3x1 double}

  Columns 14 through 19

    {3x1 double}    {[434233]}    {[8.0729e+11]}    {0x0 double}    {[69]}    {[63]}
for k = 1:numel(DBC)
    DBC{k}.'
end
ans = 
     1
ans = 
     1
ans = 
         2800
ans = 
    1.303e-29
ans = 
    0.0001003
ans = 
       0.0664
ans = 
        0.298
ans = 
       2705.1
ans = 
         0.65
ans = 
      0.00578
ans = 1×3
     0     2     0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans = 1×3
     0     1     0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans = 1×3
    11     6     5
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans = 1×3
    10     1    10
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans = 
      434233
ans = 
   8.0729e+11
ans =

     []
ans = 
    69
ans = 
    63

You can format them at your leisure. Use either sprintf or fprintf depending on what you want to do.

.

4 个评论
显示 2更早的评论隐藏 2更早的评论

Star Strider 2025-3-21

在 MATLAB Online 中打开

As always, my pleasure!

The third-frrom-the-last character is a space. Theere is nothing there to convert, so it produces an empty ceell. You can assign it any value you want, including NaN.

Also, I transposed the vector elements so that they displayed as they do in the original vector.

If you leave them un-transposed and use the vertcat function, they will form individual elements of a column vector that you can then use as the numeric argument in fprintf or sprintf.

Example —

DataBlock.Columns = {' 1', '1', ' 2800.033883', ' 1.303E-29', ' 1.003E-04', '.0664', '0.298', ' 2705.1396', '0.65', '0.005780', '          0 2 0', '          0 1 0', ' 11  6  5      ', ' 10  1 10      ', '434233', '807294713152', ' ', '   69.0', '   63.0'}
DataBlock = struct with fields:
    Columns: {1x19 cell}
format shortG
DBC = cellfun(@(x)sscanf(x,'%g'),DataBlock.Columns,Unif=0);
DBC = DBC.';
disp(DBC)                                                       % Up To Here, My Previous Code
    {[         1]}
    {[         1]}
    {[      2800]}
    {[ 1.303e-29]}
    {[ 0.0001003]}
    {[    0.0664]}
    {[     0.298]}
    {[    2705.1]}
    {[      0.65]}
    {[   0.00578]}
    {3x1 double  }
    {3x1 double  }
    {3x1 double  }
    {3x1 double  }
    {[    434233]}
    {[8.0729e+11]}
    {0x0 double  }
    {[        69]}
    {[        63]}
Lv = cell2mat(cellfun(@isempty,DBC,Unif=0));
disp(Lv)
   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
   1
   0
   0
DBC{Lv} = NaN;                                                  % Assign The Empty Element As ‘NaN’
DBCv = vertcat(DBC{:});                                         % Use ‘vertcat’
disp(DBCv)                                                      % Display All The Resulting Elements
            1
            1
         2800
    1.303e-29
    0.0001003
       0.0664
        0.298
       2705.1
         0.65
      0.00578
            0
            2
            0
            0
            1
            0
           11
            6
            5
           10
            1
           10
   4.3423e+05
   8.0729e+11
          NaN
           69
           63

You can detect the NaN value by using the isnan function to create a logical vector that will give its logical index, or use the ‘Lv’ vector I created, then replace it with anything you want, except an empty value, since numeric arrays do not permit that. With the NaN value, it iis simply considered ‘missing’,

.

Francesco 2025-3-21

Thank you so much!

Star Strider 2025-3-21

As always, my pleasure!

请先登录，再进行评论。

Convert chars into formatted numbers

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

4 个评论
显示 2更早的评论隐藏 2更早的评论

更多回答（0 个）

类别

产品

版本

标签

Community Treasure Hunt

Convert chars into formatted numbers

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

4 个评论 显示 2更早的评论 隐藏 2更早的评论

更多回答（0 个）

类别

产品

版本

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

4 个评论
显示 2更早的评论隐藏 2更早的评论