Correction: I don't actually mean a matrix necessarily. I know that the dates can't be written into a matrix.
Read specific data columns from a text file based on header name requested by user
12 次查看(过去 30 天)
显示 更早的评论
Hello,
I have the matlab version 2018a.
I'm trying to extract specific columns of a text file based on the header name of the column. I have tried couple of different methods such as readtable, textscanf, etc. but, none of them exactly worked as I expected.
I have attached the text file itself. I'm trying to make sure the code I'm writing is not slow because there are 1000's of these files that I need to look into in a for-loop possibly.
The structure never changes but, the header columns can be in different positions and that's the reason why I want the code to find the header name no matter which position the column is in.
Here is a sample from the text file:
As it can be seen, the same dates are repeated below with different headers (information) and it is repeated 3-4 times in the actual text file. If I know how to pick up "WOPR - PROD1", "WOPR-PROD2", and "FOPT" columns and put them into a matrix in this order [WOPR-PROD1; WOPR-PROD2; FOPT] I can figure out the rest I believe. I prefer not to modify the text file itself if possible.
"--------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------"
"SUMMARY OF RUN Original_1
"--------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------"
"DATE ""YEARS ""FOPR ""FWPR ""FGPR ""FOPT ""FGPT ""FWPT ""FWCT ""FWIR "
" ""YEARS ""STB/DAY ""STB/DAY ""MSCF/DAY ""STB ""MSCF ""STB "" ""STB/DAY "
" "" "" "" "" "" "" "" "" "" "
" "" "" "" "" "" "" "" "" "" "
"--------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------"
" 1JAN2009" 0 0 0 0 0 0 0 0 0
" 1FEB2009" 0.084873 0 0 0 0 0 0 0 0
" 1MAR2009" 0.161533 2000.000 65.16867 1360.000 56000.00 38080.00 1824.723 0.031556 0
" 1APR2009" 0.246407 2000.000 67.93040 1360.000 118000.0 80240.00 3906.001 0.032849 0
" 1MAY2009" 0.328542 2449.850 53.91752 1665.898 191495.5 130216.9 5523.527 0.021535 0
"--------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------"
"SUMMARY OF RUN Original_1 "
"--------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------"
"DATE ""FWIT ""FGOR ""FOIP ""FWIP ""FGIP ""FPR ""WOPR ""WOPR ""WOPR "
" ""STB ""MSCF/STB ""STB ""STB ""MSCF ""PSIA ""STB/DAY ""STB/DAY ""STB/DAY "
" "" "" ""*10**3 ""*10**3 ""*10**3 "" "" "" "" "
" "" "" "" "" "" "" ""PROD1 ""PROD2 ""PROD3 "
" "" "" "" "" "" "" "" "" "" "
"--------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------""-----------"
" 1JAN2009" 0 0 31190.54 645456.1 21209.57 6553.930 0 0 0
" 1FEB2009" 0 0 31190.54 645456.1 21209.57 6553.922 0 0 0
" 1MAR2009" 0 0.680000 31134.54 645454.2 21171.49 6473.267 0 0 0
" 1APR2009" 0 0.680000 31072.54 645452.2 21129.33 6394.598 0 0 0
" 1MAY2009" 0 0.680000 30999.18 645450.7 21079.44 6296.722 0 1675.190 0
Any help is appreciated. Thank you.
7 个评论
Stephen23
2019-6-1
编辑:Stephen23
2019-6-1
I would use textscan, something like this:
- read the very first line using fgetl
- identify the number of columns (regexp, count delimiters, or whatever).
- generate textscan format strings, for header and data.
- in a while loop import the file data using textscan.
- post-process the headers to generate unique valid fieldnames / table variable names.
- convert to structure or table.
- now the order of the columns is irrelevant: simply access the structure fields / table variables.
You should read this (EDIT: unfortunately TMW seem to have removed the very useful example from this page showing how to use while to read blocks of data, but note how the repeated textscan calls could be within a while loop):
采纳的回答
更多回答(1 个)
Bob Thompson
2019-5-31
I started to work on this, and realized that with my knowledge the code was going to be pretty ugly. If anybody knows a way to convert the text into an array that would help a lot, but I do not know how to do that.
With that in mind, I abandoned regexp, as it got too complex and ugly with the different layers of cells needed to break everything. Instead I just used fgetl and parsing to look through the file for the key words you want.
I am going to assume that the matrix layouts are going to remain the same for all of your text files. For example, in the posted sample you have WOPR in the first row, with PROD1, PROD2, PROD3 in the fourth row. The sample code I wrote is looking for these specific locations. They can be in any of the matrices, and don't all need to be in the same matrix, but WOPR is assumes to be in the first row, and PROD# in the fourth. You will need to adjust things if these references are not consistent.
flist = dir('ORIGINAL_*.txt');
for n = 3:length(flist) % dir usually picks up a '.' and '..' listing as the first two elements. If this is not the case just start i = 1
% Read in file
A = fopen(flist(n).name);
% Initialize everything
line = fgetl(A);
c = 1;
i = 0;
j = 0;
ifpt = [];
iwpr = {};
FPT = {};
WPR = cell(2,1);
% Loop through lines of file
while ~isnumeric(line)
tmp = strsplit(line); % Split line at white space, to find proper column
strt = strfind(tmp,'SUMMARY'); % Look for beginning of matrix
strt = strt(~cellfun('isempty',strt)); % Remove negative results from check
% Record data from previous matrix
if ~isempty(strt)
if i > 0 % First matrix check
FOPT(:,1,i) = cellfun(@(x) str2num(x),FPT(5:9)); % Record FOPT data in array
FPT = {}; % Reset FPT var
end
if j > 0
WOPR(:,1,i) = cellfun(@(x) str2num(x),WPR{1}(3:7)); % Record WOPR-PROD1 in first column
WOPR(:,2,i) = cellfun(@(x) str2num(x),WPR{2}(3:7)); % Record WOPR-PROD2 in second column
WPR = {}; % Reset
end
ifpt = []; % Reset
iwpr = {}; % Reset
end
% Check for desired columns
fpt = strfind(tmp,'FOPT'); % Look for FOPT
fpt = fpt(~cellfun('isempty',fpt)); % Remove negatives
if ~isempty(fpt)
ifpt = find(contains(tmp,'FOPT')); % Get index of positive result
i = i + 1; % Advance FOPT results array index
end
wpr = strfind(tmp,'WOPR'); % Look for WOPR
wpr = wpr(~cellfun('isempty',wpr));
if ~isempty(wpr)
for k = 1:6 % Skip lines to find PROD
line = fgetl(A); c = c + 3;
end
% NOTE: The above will ruin the indexing of FOPT array if both
% occur in the same matrix
tmp = strsplit(line);
iwpr{1} = find(contains(tmp,'PROD1')); % Check which WOPR is PROD1
iwpr{2} = find(contains(tmp,'PROD2')); % Check which WOPR is PROD2
j = j + 1; % Advance WOPR results array index
end
% Capture data
if ~isempty(ifpt)&size(tmp,2)>1 % FOPT exists in matrix, and isn't blank line
FPT = vertcat(FPT, tmp(ifpt));
end
if ~isempty(iwpr)&size(tmp,2)>1 % WOPR exists in matrix, and isn't blank line
WPR{1} = vertcat(WPR{1}, tmp(iwpr{1}+1));
WPR{2} = vertcat(WPR{2}, tmp(iwpr{2}+1));
end
% Advance through file
line = fgetl(A);
c = c + 1;
end
% Record results from last matrix
if ~isempty(FPT) % FOPT existed in last matrix
if i > 0
FOPT(:,1,i) = cellfun(@(x) str2num(x),FPT(5:9));
FPT = {};
end
end
if ~isempty(WPR{1}) % WOPR existed in last matrix
if j > 0
WOPR(:,1,i) = cellfun(@(x) str2num(x),WPR{1}(3:7));
WOPR(:,2,i) = cellfun(@(x) str2num(x),WPR{2}(3:7));
WPR = {};
end
end
end
2 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Text Files 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!