Read file with non-uniform lines?

6 次查看(过去 30 天)
Hi. I'm a Matlab newbie. I would like to read in a file where the lines have different formats, as below.
% Coordinates
% Code ID X Y
C 101 0.001 0.001
C 102 1.002 0.002
C 103 1.003 1.003
C 104 0.004 1.004
% Distances
% Code ID From To Dist
D 201 101 103 1.417
D 202 102 104 1.414
If the first character is C, use...
A = textscan(fid,'%c %d %f %f')
If the first character is D, use...
A = textscan(fid,'%c %d %d %d %f')
After, I'd like to assign the data to structs (c.id, c.x, c.y, d.id, d.from, d.to, d.dist), but first I think I just need to get it scanned in. Is it possible to apply some logic to reading the file? Thank you.
  5 个评论
Walter Roberson
Walter Roberson 2020-10-26
'^\s*C.*$', 'dotexceptnewline', 'lineachors'
or
'(?<=(^|\n))\s*C[^\n]*'
with no additional options needed
bene1
bene1 2020-10-26
Great, thanks again. Now have...
C =
4×1 cell array
{' C 101 0.001 0.001←'}
{' C 102 1.002 0.002←'}
{' C 103 1.003 1.003←'}
{' C 104 0.004 1.004←'}
With C as a 4x1, I believe my next step is to extract out the columns. My first thought was
A = textscan(C,'%c %d %f %f')
but I see I can't do that. Looking into cell2struct?

请先登录,再进行评论。

采纳的回答

Walter Roberson
Walter Roberson 2020-10-26
Named tokens, I said. Do not extract the lines ahead of time.
FileText = fileread(YourFileName);
Ctokens = regexp(FileText, '^\s*C\s+(?<ID>\d+)\s+(?<X>\S+)\s+(?<Y>\S+)', 'names', 'lineanchors');
%Ctokens will now be a struct array with field names ID, X, and Y, each of which are character vectors.
C.ID = str2double({Ctokens.ID});
C.X = str2double({Ctokens.X});
C.Y = str2double({Ctokens.Y});
Dtokens = regexp(FileText, '^\s*D\s+(?<ID>\d+)\s+(?<From>\d+)\s+(?<To>\d+)\s+(?<Dist>\S+)', 'names', 'lineanchors');
%Dtokens will now be a struct array with field names ID, From, To, Dist, each of which are character vectors.
D.ID = str2double({Dtokens.ID});
D.From = str2double({Dtokens.From});
D.To = str2double({Dtokens.To});
D.Dist = str2double({Dtokens.Dist});
Amount of processing work is pretty minimial. Pretty much all of the effort is in figuring out the proper regexp patterns to use (which can be pretty tricky when there are variant lines.)

更多回答(1 个)

per isakson
per isakson 2020-10-26
>> S = cssm( 'd:\m\cssm\cssm.txt' )
S =
1×2 struct array with fields:
header
colhead
Code
data
>> S(1)
ans =
struct with fields:
header: "Coordinates"
colhead: ["Code" "ID" "X" "Y"]
Code: [4×1 string]
data: [4×3 double]
>> S(2)
ans =
struct with fields:
header: "Distances"
colhead: ["Code" "ID" "From" "To" "Dist"]
Code: [2×1 string]
data: [2×4 double]
where
function sas = cssm( ffs )
chr = fileread( ffs );
str = string( chr );
str = replace( str, char([13,10]), newline ); % get rid of the carriage return
% split the string into blocks. Use the block header as delimiter.
[blk,del] = strsplit( str, '(?m)^\x20*%\x20\w+\x20*\n' ...
, 'DelimiterType','RegularExpression' );
blk(1) = []; % remove empty block before the first delimiter
len = numel( del );
sas(1,len) = struct( 'header',"", 'colhead',"", 'Code',"", 'data',nan );
for jj = 1 : len % loop over all blocks
sas(jj).header = regexp( del(jj), '\w+', 'match','once' ); % match the name
cac = textscan( blk(jj), "%[^\n]", 1 ); % read the first row
tmp = strsplit( string(cac{1}) ); % split the row into column headers
tmp(1) = []; % remove the comment character, "%"
sas(jj).colhead = tmp;
cac = textscan( blk(jj), ['%s',repmat('%f',1,numel(tmp)-1)] ...
, 'Headerlines',1, 'CollectOutput',true );
sas(jj).Code = string(cac{1});
sas(jj).data = cac{2};
end
end
and where cssm.txt contains the data given in of your question.

类别

Help CenterFile Exchange 中查找有关 Large Files and Big Data 的更多信息

产品


版本

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by