Convert chars into formatted numbers

Hello everyone,
I am working on a code which parses a .header file to interpret a big database stored in a .data file (for those familiar, HITRAN).
From the header file I am able to obtain information on where to separate each line of the dataset into a variable and which format this variable is in. I will put below an example of data:
% ... Parse the .header file to get variable names (.Names) and their numerical
% formatting (.Values) in C. Some of them are double, some of them are integer numbers
% related to quantum states. Note that Names and Values are not in the same
% order as the columns of the .data file.
FormatBlock.Names = {'a', 'gamma_air', 'gp', 'local_iso_id', 'molec_id', 'sw', 'local_lower_quanta', 'local_upper_quanta', 'gpp', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'iref', 'line_mixing_flag', 'ierr', 'nu', 'gamma_self', 'global_lower_quanta'};
FormatBlock.Values = {'%10.3E', '%5.4f', '%7.1f', '%1d', '%2d', '%10.3E', '%15s', '%15s', '%7.1f', '%10.4f', '%4.2f', '%8.6f', '%15s', '%12s', '%1s', '%6s', '%12.6f', '%5.3f', '%15s'};
% ... Parse the .data file, dividing it into lines and separating values
% into columns keeping them as char. Here an example of one line
DataBlock.Names = {'molec_id', 'local_iso_id', 'nu', 'sw', 'a', 'gamma_air', 'gamma_self', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'global_lower_quanta', 'local_upper_quanta', 'local_lower_quanta', 'ierr', 'iref', 'line_mixing_flag', 'gp', 'gpp'}
DataBlock.Columns = ' 1', '1', ' 2800.033883', ' 1.303E-29', ' 1.003E-04', '.0664', '0.298', ' 2705.1396', '0.65', '0.005780', ' 0 2 0', ' 0 1 0', ' 11 6 5 ', ' 10 1 10 ', '434233', '807294713152', ' ', ' 69.0', ' 63.0'}.
The question is: assuming that I am able to reorganise Names and Values in the same order of the data file, how can I convert the DataBlocks.Columns chars into numbers following each FormatBlock.Values?
For example:
'molec_id' = ' 1' has formatting '%2d', hence: "molec_id" = 1
'local_lower_quanta' = ' 0 1 0' has formatting '%15s', hence 'local_lower_quanta' = [0 1 0]
'nu' = ' 2800.033883' has formatting '%12.6f', hence 'nu' = 2.800033883e3
etc...
Thank you in advace for your help!

 采纳的回答

I am not certain what result you want.
Try something like this —
% ... Parse the .header file to get variable names (.Names) and their numerical
% formatting (.Values) in C. Some of them are double, some of them are integer numbers
% related to quantum states. Note that Names and Values are not in the same
% order as the columns of the .data file.
FormatBlock.Names = {'a', 'gamma_air', 'gp', 'local_iso_id', 'molec_id', 'sw', 'local_lower_quanta', 'local_upper_quanta', 'gpp', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'iref', 'line_mixing_flag', 'ierr', 'nu', 'gamma_self', 'global_lower_quanta'};
FormatBlock.Values = {'%10.3E', '%5.4f', '%7.1f', '%1d', '%2d', '%10.3E', '%15s', '%15s', '%7.1f', '%10.4f', '%4.2f', '%8.6f', '%15s', '%12s', '%1s', '%6s', '%12.6f', '%5.3f', '%15s'};
% ... Parse the .data file, dividing it into lines and separating values
% into columns keeping them as char. Here an example of one line
DataBlock.Names = {'molec_id', 'local_iso_id', 'nu', 'sw', 'a', 'gamma_air', 'gamma_self', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'global_lower_quanta', 'local_upper_quanta', 'local_lower_quanta', 'ierr', 'iref', 'line_mixing_flag', 'gp', 'gpp'}
DataBlock = struct with fields:
Names: {1x19 cell}
DataBlock.Columns = {' 1', '1', ' 2800.033883', ' 1.303E-29', ' 1.003E-04', '.0664', '0.298', ' 2705.1396', '0.65', '0.005780', ' 0 2 0', ' 0 1 0', ' 11 6 5 ', ' 10 1 10 ', '434233', '807294713152', ' ', ' 69.0', ' 63.0'}
DataBlock = struct with fields:
Names: {1x19 cell} Columns: {1x19 cell}
format shortG
DBC = cellfun(@(x)sscanf(x,'%g'),DataBlock.Columns,Unif=0);
disp(DBC)
Columns 1 through 13 {[1]} {[1]} {[2800]} {[1.303e-29]} {[0.0001003]} {[0.0664]} {[0.298]} {[2705.1]} {[0.65]} {[0.00578]} {3x1 double} {3x1 double} {3x1 double} Columns 14 through 19 {3x1 double} {[434233]} {[8.0729e+11]} {0x0 double} {[69]} {[63]}
for k = 1:numel(DBC)
DBC{k}.'
end
ans =
1
ans =
1
ans =
2800
ans =
1.303e-29
ans =
0.0001003
ans =
0.0664
ans =
0.298
ans =
2705.1
ans =
0.65
ans =
0.00578
ans = 1×3
0 2 0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans = 1×3
0 1 0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans = 1×3
11 6 5
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans = 1×3
10 1 10
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans =
434233
ans =
8.0729e+11
ans = []
ans =
69
ans =
63
You can format them at your leisure. Use either sprintf or fprintf depending on what you want to do.
.

4 个评论

Thank you very much for your answer. This is 99% what i was looking for. The only variable which is not converted is the third from last, which results in an empty array. I will try to find a workaround.
As always, my pleasure!
The third-frrom-the-last character is a space. Theere is nothing there to convert, so it produces an empty ceell. You can assign it any value you want, including NaN.
Also, I transposed the vector elements so that they displayed as they do in the original vector.
If you leave them un-transposed and use the vertcat function, they will form individual elements of a column vector that you can then use as the numeric argument in fprintf or sprintf.
Example —
DataBlock.Columns = {' 1', '1', ' 2800.033883', ' 1.303E-29', ' 1.003E-04', '.0664', '0.298', ' 2705.1396', '0.65', '0.005780', ' 0 2 0', ' 0 1 0', ' 11 6 5 ', ' 10 1 10 ', '434233', '807294713152', ' ', ' 69.0', ' 63.0'}
DataBlock = struct with fields:
Columns: {1x19 cell}
format shortG
DBC = cellfun(@(x)sscanf(x,'%g'),DataBlock.Columns,Unif=0);
DBC = DBC.';
disp(DBC) % Up To Here, My Previous Code
{[ 1]} {[ 1]} {[ 2800]} {[ 1.303e-29]} {[ 0.0001003]} {[ 0.0664]} {[ 0.298]} {[ 2705.1]} {[ 0.65]} {[ 0.00578]} {3x1 double } {3x1 double } {3x1 double } {3x1 double } {[ 434233]} {[8.0729e+11]} {0x0 double } {[ 69]} {[ 63]}
Lv = cell2mat(cellfun(@isempty,DBC,Unif=0));
disp(Lv)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
DBC{Lv} = NaN; % Assign The Empty Element As ‘NaN’
DBCv = vertcat(DBC{:}); % Use ‘vertcat’
disp(DBCv) % Display All The Resulting Elements
1 1 2800 1.303e-29 0.0001003 0.0664 0.298 2705.1 0.65 0.00578 0 2 0 0 1 0 11 6 5 10 1 10 4.3423e+05 8.0729e+11 NaN 69 63
You can detect the NaN value by using the isnan function to create a logical vector that will give its logical index, or use the ‘Lv’ vector I created, then replace it with anything you want, except an empty value, since numeric arrays do not permit that. With the NaN value, it iis simply considered ‘missing’,
.
As always, my pleasure!

请先登录,再进行评论。

更多回答(0 个)

类别

帮助中心File Exchange 中查找有关 Data Type Conversion 的更多信息

产品

版本

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by