Using TEXTSCAN to import an ASCII file with a header and blank lines between different data sets
2 次查看(过去 30 天)
显示 更早的评论
I have several text files that represent a house and each file has several data sets that represent a room within the house.
The text file looks similar to the following but a majority of the data has been deleted. Each zone has 1440 lines of data and each house has a different number of zones:
project: House1_1 Tue Mar 19 12:30:42 2013
description:
date time time Ozone
of day [s] [kg/kg]
level: firstfloor zone: bedroom1
Jan01 00:00:00 0 0.000e+000
Jan01 00:01:00 60 1.487e-009
Jan01 00:02:00 120 5.330e-009
Jan01 00:03:00 180 1.084e-008
Jan01 23:57:00 86220 1.575e-007
Jan01 23:58:00 86280 1.575e-007
Jan01 23:59:00 86340 1.575e-007
Jan01 24:00:00 86400 1.575e-007
level: firstfloor zone: kitchen
Jan01 00:00:00 0 0.000e+000
Jan01 00:01:00 60 1.483e-009
Jan01 00:02:00 120 5.315e-009
Jan01 00:03:00 180 1.081e-008
Jan01 23:57:00 86220 1.564e-007
Jan01 23:58:00 86280 1.564e-007
Jan01 23:59:00 86340 1.564e-007
Jan01 24:00:00 86400 1.564e-007
level: firstfloor zone: bedroom2
Jan01 00:00:00 0 0.000e+000
Jan01 00:01:00 60 1.486e-009
Jan01 00:02:00 120 5.321e-009
Jan01 00:03:00 180 1.081e-008
Jan01 23:57:00 86220 1.549e-007
Jan01 23:58:00 86280 1.549e-007
Jan01 23:59:00 86340 1.549e-007
Jan01 24:00:00 86400 1.549e-007
The final goal is to generate a graph of ozone concentration versus time for each house that contains all of the zones for that house. Presently I am having trouble importing the data. I can use the following code to open the first zone in one file. I only need the data from the fourth column. I do not need the first 9 lines (header info) or the 3 lines in between zones but I need the data for each zone to be its own data set.
fid=fopen('House1-1.txt');
temp=textscan(fid,'%*s %*s %*d %f','Headerlines',9);
fclose(fid);
I can not figure out how to create a loop to read to the end of each file and get the data for each zone into its own array. I also need the loop to read each house file within the folder. Any help would be appreciated.
0 个评论
采纳的回答
Cedric
2013-3-27
编辑:Cedric
2013-3-27
An alternative could be to use REGEXP to get blocks of data, e.g. in a struct array, and then post-process the content. To illustrate using the content that you gave:
>> buffer = fileread('myData.txt') ;
>> pattern = 'level:\s*(?<level>\S+)\s+zone:\s*(?<zone>\S+)\s*(?<data>.*?)(?=($|level))' ;
>> blocks = regexp(buffer, pattern, 'names' )
blocks =
1x3 struct array with fields:
level
zone
data
>> blocks(1)
ans =
level: 'firstfloor'
zone: 'bedroom1'
data: [1x282 char]
>> blocks(2)
ans =
level: 'firstfloor'
zone: 'kitchen'
data: [1x282 char]
>> blocks(3)
ans =
level: 'firstfloor'
zone: 'bedroom2'
data: [1x277 char]
So, using a simple loop, you can process all blocks already parsed:
for k = 1 : length(blocks)
fprintf('Level = %s, zone = %s\n', blocks(k).level, blocks(k).zone) ;
... do something, e.g. with textscan, on blocks(k).data
end
0 个评论
更多回答(3 个)
per isakson
2013-3-27
编辑:per isakson
2013-3-27
Here is one of many alternate solutions.
>> [ header, block_head, block_data ] = cssm()
header =
' project: House1_1 Tue Mar 19 12:30:42 2013'
''
' description: '
''
' date time time Ozone'
' of day [s] [kg/kg]'
''
block_head =
' level: firstfloor zone: bedroom1'
'zone: kitchen'
'zone: bedroom2'
block_data =
[8x1 double]
[8x1 double]
[8x1 double]
>>
The values of block_head are obviously corrupted.
where cssm is
function [ header, block_head, block_data ] = cssm()
fid = fopen( 'cssm.txt' );
% cac = textscan( fid, '%[^\n]' ); swallows empty lines
cac = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
ixs = find( strncmp( 'level:', cac{:}, 6 ) );
fid = fopen( 'cssm.txt' );
header = cell( ixs(1)-1, 1 );
for ii = 1 : ixs(1)-1
header{ii} = fgetl( fid );
end
nnblock = numel( ixs );
ixs(end+1) = size( cac{:}, 1 );
block_head = cell( nnblock, 1 );
block_data = cell( nnblock, 1 );
for iib = 1 : nnblock
block_head{iib} = fgetl( fid );
block_data(iib) = textscan(fid,'%*s%*s%*d%f', ixs(iib+1)-ixs(iib) );
end
fclose( fid );
end
and cssm.txt consist of the data line in your question.
.
Next try without reading block_head:
>> [ header, block_head, block_data ] = cssm()
header =
' project: House1_1 Tue Mar 19 12:30:42 2013'
''
' description: '
''
' date time time Ozone'
' of day [s] [kg/kg]'
block_head =
[]
[]
[]
block_data =
[8x1 double]
[8x1 double]
[8x1 double]
where cssm is
function [ header, block_head, block_data ] = cssm()
fid = fopen( 'cssm.txt' );
% cac = textscan( fid, '%[^\n]' ); swallows empty lines
cac = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
ixs = find( strncmp( 'level:', cac{:}, 6 ) );
fid = fopen( 'cssm.txt' );
header = cell( ixs(1)-2, 1 );
for ii = 1 : ixs(1)-2
header{ii} = fgetl( fid );
end
nnblock = numel( ixs );
ixs(end+1) = size( cac{:}, 1 ) + 2;
block_head = cell( nnblock, 1 );
block_data = cell( nnblock, 1 );
for iib = 1 : nnblock
block_data(iib) = textscan( fid, '%*s%*s%*d%f' ...
, ixs(iib+1)-ixs(iib)-3 ...
, 'Headerlines', 3 );
end
fclose( fid );
end
.
One more try:
>> [ header, block_head, block_data ] = cssm()
header =
' project: House1_1 Tue Mar 19 12:30:42 2013'
''
' description: '
''
' date time time Ozone'
' of day [s] [kg/kg]'
block_head =
{3x1 cell}
{3x1 cell}
{3x1 cell}
block_data =
[8x1 double]
[8x1 double]
[8x1 double]
>> block_head{1}
ans =
''
'level: firstfloor zone: bedroom1'
''
>> block_head{2}
ans =
''
''
'level: firstfloor zone: kitchen'
>> block_head{3}
ans =
''
''
'level: firstfloor zone: bedroom2'
block_head contains two successive empty "lines" in block_head 2 and 3. However, the data file does nowhere display an empty line after another empty line. I find this strange.
where
function [ header, block_head, block_data ] = cssm()
fid = fopen( 'cssm.txt' );
% cac = textscan( fid, '%[^\n]' ); swallows empty lines
cac = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
ixs = find( strncmp( 'level:', cac{:}, 6 ) );
fid = fopen( 'cssm.txt' );
header = cell( ixs(1)-2, 1 );
for ii = 1 : ixs(1)-2
header{ii} = fgetl( fid );
end
nnblock = numel( ixs );
ixs(end+1) = size( cac{:}, 1 ) + 2;
block_head = cell( nnblock, 1 );
block_data = cell( nnblock, 1 );
for iib = 1 : nnblock
block_head(iib) = textscan( fid, '%s', 3, 'Delimiter', '\n' );
block_data(iib) = textscan( fid, '%*s%*s%*d%f' ...
, ixs(iib+1)-ixs(iib)-3 ...
, 'Headerlines', 0 );
end
fclose( fid );
end
.
Discussion:
There must be a better way to handle empty lines.
0 个评论
Kristia
2013-3-27
6 个评论
Cedric
2013-4-1
编辑:Cedric
2013-4-1
You're welcome! Don't forget to [ Accept ] one of the answers if it helped, and if you accept mine, don't forget to /\ vote for Per Isakson's answer as well, because he took time to write and test a quite complete answer that is indeed the standard way for processing this kind of file structure (my answer is more compact, but less standard).
Gabriel Felix
2020-5-24
I had to use \n at the end of each line. Without it I couldn't make textscan() work properly, even thoug the "HeaderLines" was configured according to the text file lines. This was the only solution I found after struggling with the code for an intire day.
This was the text:
!
!
! alfa (graus) = 5.0
!
! Id. x/s z/s alfai cl c*cl/cmed cdi cmc/4
! (graus)
1 .246 .050 -1.209 .255 .332 .00538 .0170
2 .292 .150 -1.098 .259 .319 .00496 .0545
3 .339 .250 -.925 .254 .297 .00410 .0944
4 .385 .350 -.741 .243 .268 .00315 .1341
5 .432 .450 -.561 .227 .235 .00223 .1714
6 .479 .550 -.393 .206 .199 .00141 .2034
7 .525 .650 -.238 .181 .163 .00075 .2266
8 .572 .750 -.101 .152 .126 .00027 .2362
9 .619 .850 .014 .116 .089 -.00003 .2236
10 .659 .938 .103 .074 .052 -.00013 .1693
!
! CL asa = .208
! CDi asa = .00258
! e (%) = 88.9
! CMc/4 asa = .1339
My code:
%! alfa (graus) = 5.0
P = textscan(fid,'! alfa (graus) = %f','Delimiter',' ','MultipleDelimsAsOne',true,'headerLines',2,'CollectOutput',1);
alpha(1) = P{1};
%! CL asa = .208
P = textscan(fid,'! CL asa = %f\n','Delimiter',' ','MultipleDelimsAsOne',true,'CollectOutput',1,'headerLines',4+n);
CL(1) = P{1};
%! CDi asa = .00258
P = textscan(fid,'! CDi asa = %f\n','Delimiter',' ','MultipleDelimsAsOne',true,'CollectOutput',1,'headerlines',0);
CDi(1) = P{1};
%! CMc/4 asa = .1339
P = textscan(fid,'! CMc/4 asa = %f','Delimiter',' ','MultipleDelimsAsOne',true,'CollectOutput',1,'HeaderLines',2);
Cmc4(1) = P{1};
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Import and Export 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!