Parsing data from complicated text files
10 次查看(过去 30 天)
显示 更早的评论
I have about 20 years of text files that contain the records of individual tests (about 8GB of plain text files, about 4,000 individual files). Each file has this format:
********************************************************************************
Test Data Report
Station ID: [Test Station ID Number]
Station Part Number: [Test Station Part Number]
Station Serial Number: [Test Station Serial Number]
Test Procedure Number: [Test Procedure Number] [Test Procedure Revision]
Operation: [colloquial test]
Serial Number of test subject: [Serial Number + plus some other info about the test]
Date: [Day, Month Date, year]
Time: [11:00:03 AM]
Operator: [Operator Name]
Number of Results: [NNNN]
Test Result: [Passed/Failed]
********************************************************************************
--------------------------------------------------------------------------------
MEASUREMENT LL READING UL UNITS STATUS
--------------------------------------------------------------------------------
Enter Testing Time: Done
--------------------------------------------------------------------------------
08:00
--------------------------------------------------------------------------------
FOE, CAL: Passed
--------------------------------------------------------------------------------
CALIBRATION IS VALID
--------------------------------------------------------------------------------
Test Start Time: Done
--------------------------------------------------------------------------------
11:00:33 AM
--------------------------------------------------------------------------------
Group Meas Init: Passed
--------------------------------------------------------------------------------
Datapoint_01 LL Measured UL Units Passed
Datapoint_02 LL Measured UL Units Passed
Datapoint_03 LL Measured UL Units Passed
Datapoint_04 LL Measured UL Units Passed
Datapoint_05 LL Measured UL Units Passed
Datapoint_06 LL Measured UL Units Passed
Datapoint_07 LL Measured UL Units Passed
Datapoint_08 LL Measured UL Units Passed
Datapoint_09 LL Measured UL Units Passed
Datapoint_10 LL Measured UL Units Passed
Datapoint_11 LL Measured UL Units Passed
Datapoint_12 LL Measured UL Units Passed
Datapoint_13 LL Measured UL Units Passed
Datapoint_14 LL Measured UL Units Passed
Datapoint_15 LL Measured UL Units Passed
Datapoint_16 LL Measured UL Units Passed
Datapoint_17 LL Measured UL Units Passed
Datapoint_18 LL Measured UL Units Passed
Datapoint_19 LL Measured UL Units Passed
Datapoint_20 LL Measured UL Units Passed
Datapoint_21 LL Measured UL Units Passed
Datapoint_22 LL Measured UL Units Passed
Datapoint_23 LL Measured UL Units Passed
Datapoint_24 LL Measured UL Units Passed
Datapoint_25 LL Measured UL Units Passed
Datapoint_26 LL Measured UL Units Passed
Datapoint_27 LL Measured UL Units Passed
Datapoint_28 Measured UL Units Passed
Datapoint_29 Measured Units Passed
--------------------------------------------------------------------------------
Group Meas Ramp: Passed
--------------------------------------------------------------------------------
Datapoint_01 LL Measured UL Units Passed
Datapoint_02 LL Measured UL Units Passed
Datapoint_03 LL Measured UL Units Passed
Datapoint_04 LL Measured UL Units Passed
Datapoint_05 LL Measured UL Units Passed
Datapoint_06 LL Measured UL Units Passed
Datapoint_07 LL Measured UL Units Passed
Datapoint_08 LL Measured UL Units Passed
Datapoint_09 LL Measured UL Units Passed
Datapoint_10 LL Measured UL Units Passed
Datapoint_11 LL Measured UL Units Passed
Datapoint_12 LL Measured UL Units Passed
Datapoint_13 LL Measured UL Units Passed
Datapoint_14 LL Measured UL Units Passed
Datapoint_15 LL Measured UL Units Passed
Datapoint_16 LL Measured UL Units Passed
Datapoint_17 LL Measured UL Units Passed
Datapoint_18 LL Measured UL Units Passed
Datapoint_19 LL Measured UL Units Passed
Datapoint_20 LL Measured UL Units Passed
Datapoint_21 LL Measured UL Units Passed
Datapoint_22 LL Measured UL Units Passed
Datapoint_23 LL Measured UL Units Passed
Datapoint_24 LL Measured UL Units Passed
Datapoint_25 LL Measured UL Units Passed
Datapoint_26 LL Measured UL Units Passed
Datapoint_27 LL Measured UL Units Passed
Datapoint_28 Measured UL Units Passed
Datapoint_29 Measured Units Passed
--------------------------------------------------------------------------------
Time (after meas): Done
--------------------------------------------------------------------------------
11:01:16 AM
--------------------------------------------------------------------------------
Group Meas Ramp: Passed
--------------------------------------------------------------------------------
Datapoint_01 LL Measured UL Units Passed
Datapoint_02 LL Measured UL Units Passed
Datapoint_03 LL Measured UL Units Passed
Datapoint_04 LL Measured UL Units Passed
Datapoint_05 LL Measured UL Units Passed
Datapoint_06 LL Measured UL Units Passed
Datapoint_07 LL Measured UL Units Passed
Datapoint_08 LL Measured UL Units Passed
Datapoint_09 LL Measured UL Units Passed
Datapoint_10 LL Measured UL Units Passed
Datapoint_11 LL Measured UL Units Passed
Datapoint_12 LL Measured UL Units Passed
Datapoint_13 LL Measured UL Units Passed
Datapoint_14 LL Measured UL Units Passed
Datapoint_15 LL Measured UL Units Passed
Datapoint_16 LL Measured UL Units Passed
Datapoint_17 LL Measured UL Units Passed
Datapoint_18 LL Measured UL Units Passed
Datapoint_19 LL Measured UL Units Passed
Datapoint_20 LL Measured UL Units Passed
Datapoint_21 LL Measured UL Units Passed
Datapoint_22 LL Measured UL Units Passed
Datapoint_23 LL Measured UL Units Passed
Datapoint_24 LL Measured UL Units Passed
Datapoint_25 LL Measured UL Units Passed
Datapoint_26 LL Measured UL Units Passed
Datapoint_27 LL Measured UL Units Passed
Datapoint_28 Measured UL Units Passed
Datapoint_29 Measured Units Passed
--------------------------------------------------------------------------------
Time (after meas): Done
--------------------------------------------------------------------------------
11:01:37 AM
--------------------------------------------------------------------------------
Now, at the moment, the only things I care about are
- Whether a failure occured or not
- When that failure occured
I will likely want to perform other analysises on the data the in the future, but for the moment, this will suffice. I want to go through each report, determine whether a failure occured, record when that failure occured, and then plot all the failures as a histogram in terms of time so that I can see if there are any typical lengths of time it takes for a test to fail.
I have a pretty good amount of experience with working with data once it is in Matlab, but I am much less experienced with importing data, especially this kind of batch importing. Is there a simple way to do this, or am I essentially just using something like textscan() or fscanf() in a loop?
3 个评论
dpb
2021-3-23
Well, we still don't have a file to test with nor is there a case that fails in the text you posted...if you expect somebody to write code, you've got to do your part to give them the help needed from your end; otherwise you'll have the result of the other poster's wasted time/effort that doesn't work because what he was provided wasn't sufficient and his best guess of what it should be apparently wasn't correct.
In general, however, the idea would be to use readcell to import each file into a cell array, use contains or regexp to find rows with the key words/phrases wanted, and then parse those lines, taking into account where the group headers are to match which are which.
采纳的回答
更多回答(1 个)
Mathieu NOE
2021-3-23
hello
this is my 2 cents code to import the required data. The function will give you the time values (char array) and the number of failures. i tested it with two dummy files, one is your original data and the second one I changed the last section to create a Failed condition , plus I added another failed case with a different time value , just to check my code would correctly detect the 2 failures
Filename_in = 'data2.txt';
% Filename_out= 'dataABC_reduced.txt';
[Time_init,Time_end,fail_count] = extract_data(Filename_in);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [Time_init,Time_end,fail_count] = extract_data(Filename)
fid = fopen(Filename);
tline = fgetl(fid);
% initialization
k = 0; % counter #1
fail_count = 0; % counter #2
Time_init = '';
Time_end{1} = '';
line_fail_ind = 0;
fail_flag = 0;
while ischar(tline)
k = k+1; % loop over line index
% store initial Time value (start Time)
if contains(tline,'Time: [')
Time_init = deblank(extractBetween(tline,'[',']'))
end
% then search for 'Failed' case in line " Group Meas Ramp "
if (contains(tline,'Group Meas Ramp') && contains(tline,'Failed'))
fail_flag = 1 ;
end
if fail_flag == 1 && contains(tline,'Time (after meas)')
line_fail_ind = k;
end
% time of failure : capture when running index k = line_fail_ind + 2
% (and fail_flag == 1)
if fail_flag == 1 && k == line_fail_ind + 2
fail_count = fail_count+1;
Time_end{fail_count} = tline;
fail_flag = 0; % reset fail_flag
end
tline = fgetl(fid); % lower make matlab not case sensitive
end
fclose(fid);
end
3 个评论
Mathieu NOE
2021-3-23
hi
would you be able to copy paste the section of data that seems not to work 100% with my code ?
dpb
2021-3-24
Is this one test/file?
Is the Group Meas Init: section of interest? There is no time after it; only after the "Ramp" section is a ending time given. I presume maybe if the INIT fails, the rest of the test is aborted and there consequently is no file?
Need all the ground rules...
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Type Conversion 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!