text file data processing

2 次查看(过去 30 天)
Dear Experts, It would be great help if somebody can help on this.
I have a text file which I want to read. It has couple of measurements data. In the beginning it starts with texts followed by a data set and then text again and another data set and so on. I want to read each data set and I dont care about the text.The text file looks like:
if true
:MSR 2 # No. of measurement in file
:SYS BDS 0 # Beam Data Scanner System
#
# RFA300 ASCII Measurement Dump ( BDS format )
#
...............
..............
!
#
# X Y Z Dose
#
= -123.9 0.0 32.0 11.8
= -123.6 0.0 32.0 11.9
= -123.2 0.0 32.0 12.1
= -122.7 0.0 32.0 12.2
= -122.2 0.0 32.0 12.5
= -121.7 0.0 32.0 12.6
= -121.4 0.0 32.0 12.6
:EOM # End of Measurement
#
# RFA300 ASCII Measurement Dump ( BDS format )
#
# Measurement number 2
#
%VNR 1.0
!
#
# X Y Z Dose
#
= 132.0 0.0 100.0 8.1
= 131.7 0.0 100.0 8.2
= 131.3 0.0 100.0 8.2
= 130.8 0.0 100.0 8.3
= 130.3 0.0 100.0 8.4
= 129.8 0.0 100.0 8.6
= 129.3 0.0 100.0 8.8
= 129.0 0.0 100.0 8.8
= 128.5 0.0 100.0 8.9
= 128.0 0.0 100.0 9.2
= 127.5 0.0 100.0 9.3
= 127.2 0.0 100.0 9.4
:EOM # End of Measurement
:EOF # End of File
end
From this file I want to read the numerical data under the column header, i.e, X Y Z dose for each measurement. Attached is my text file.
I greatly appreciate any help. Thanks. Rafiq
  2 个评论
Hikaru
Hikaru 2015-1-30
There's no file attached. Have you tried the function textscan?
Mohammad
Mohammad 2015-1-30
I am sorry Hikaru. Attached is the file.

请先登录,再进行评论。

采纳的回答

Stephen23
Stephen23 2015-1-31
编辑:Stephen23 2015-1-31
This code works with your original data file (which I uploaded here too). It parses most of the file into a structure, which has size 1-by-(number of measurements). Save the code below in a script and run it:
% Read file into a string:
str = fileread('test.txt');
% Check the number of measurements:
N = sscanf(str,':MSR %d');
[S,E] = regexp(str,'(?<=^# Measurement number).*?^:EOM','lineanchors');
assert(all(N==[numel(S),numel(E)]),'This file is incomplete or corrupted.')
% Preallocate structure:
out(N) = struct();
% Loop over each measurement in the file:
for n = 1:N
sub = str(S(n):E(n));
out(n).num = sscanf(sub,'%d');
% Assign fields and values:
tkn = regexp(sub,'^%(\w{3})([^#]*?)(#\s.*?)?\s*$','lineanchors','tokens');
for m = 1:numel(tkn)
tmp = tkn{m};
out(n).(tmp{1}) = strtrim(tmp{2});
if ~isempty(tmp{3})
out(n).([tmp{1},'_note']) = tmp{3}(3:end);
end
end
% Assign header:
tmp = regexp(sub,'^#(\s+\w+)+\s+#\s+=','tokens','lineanchors','once');
out(n).hdr = regexp(strtrim(tmp),'\s+','split');
% Assign data:
tmp = regexp(sub,'^=\s+(\s+[\d\.-]+)+\s*$','match','lineanchors');
out(n).dat = cell2mat(textscan([tmp{:}],'=%f%f%f%f'));
end
%
Explore the structure in your variable viewer, it should be fairly self-explanatory as it uses the same fieldnames as your data file. There are only three new fields: "num" (measurement number), "hdr" (numeric matrix column headers), and "dat" (numeric matrix).
  4 个评论
Stephen23
Stephen23 2015-2-2
编辑:Stephen23 2015-2-2
You can access any of the parameter values by using the fieldname and the out structure, e.g. to get the FSZ values:
>> out(1).FSZ % only the first measurement
>> out.FSZ % all measurements
>> Z = {out.FSZ} % all in a cell array
Note that currently all parameter values are stored as strings. If you wish to convert all of the exclusively numeric parameters to numeric arrays, then you can try this version (I converted the date/time to a datevector too):
% Read file into a string:
str = fileread('test.txt');
% Check the number of measurements:
N = sscanf(str,':MSR %d');
[S,E] = regexp(str,'(?<=^# Measurement number).*?^:EOM','lineanchors');
assert(numel(S)==N&&numel(E)==N,'This file is incomplete or corrupted.')
% Preallocate structure:
out(N) = struct();
% Loop over each measurement in the file:
for n = 1:N
sub = str(S(n):E(n));
out(n).num = sscanf(sub,'%d');
% Assign fields and values:
tkn = regexp(sub,'^%(\w{3})([^#]*?)(#\s.*?)?\s*$','lineanchors','tokens');
for m = 1:numel(tkn)
if ~isempty(tkn{m}{3})
out(n).([tkn{m}{1},'_note']) = tkn{m}{3}(3:end);
end
% Convert to numeric array OR keep string parameter:
tmp = sscanf(tkn{m}{2},'%f',[1,Inf]);
if any(strcmpi(tkn{m}{1},{'DAT','TIM'})) || isempty(tmp)
out(n).(tkn{m}{1}) = strtrim(tkn{m}{2});
else
out(n).(tkn{m}{1}) = tmp;
end
end
% Add timestamp:
out(n).dtv([3,2,1]) = sscanf(out(n).DAT,'%f-%f-%f',[1,Inf]);
out(n).dtv = [out(n).dtv,sscanf(out(n).TIM,'%f:%f:%f',[1,Inf])];
% Assign header:
tmp = regexp(sub,'^#(\s+\w+)+\s+#\s+=','tokens','lineanchors','once');
out(n).hdr = regexp(strtrim(tmp),'\s+','split');
% Assign data:
tmp = regexp(sub,'^=\s+(\s+[\d\.-]+)+\s*$','match','lineanchors');
out(n).dat = cell2mat(textscan([tmp{:}],'=%f%f%f%f'));
end
%
Mohammad
Mohammad 2015-2-3
编辑:Mohammad 2015-2-3
Thank you very much Stephen. This helps a lot !!!

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 String Parsing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by