Extract text data and time data from notepad file

I have a very large text file in notepad and I need to extract data from it.
The exact data is displayed as such:
05/12/2011 - 14:39:38.790329 - INFO_STATUS HEX CODE 001c DECIMAL CODE 28 ERROR NAME Overflow Error DESCRIPTION Information overflow error occurred.
05/12/2011 - 14:39:39.910752 - INFO_STATUS HEX CODE 001c DECIMAL CODE 28 ERROR NAME Overflow Error DESCRIPTION Information overflow error occurred.
05/12/2011 - 14:39:41.030363 - CP_PROCESSING_STATUS HEX CODE 019c DECIMAL CODE 412 ERROR NAME Computer processing error DESCRIPTION Computer 2 experienced a commanded test error due to a processing error only
05/12/2011 - 14:39:42.150375 - INFO_STATUS HEX CODE 001c DECIMAL CODE 28 ERROR NAME Overflow Error DESCRIPTION Information overflow error occurred.
This is just a segment of the data from the actual file. There are actually many groups of lines of data like this.
What I want to do is read the file into matlab and write the data into columns that says error name, time, description, and hex code.
Any help is much appreciated. V/R, Charles Atlas

4 个评论

Does a newline always separates the messages?
Do you have a question? What have you done so far? Installed Matlab, read the file as string or as cell string, split the lines to the different parts? What does "write the data into columns" mean - do you want to create a UITABLE? Does the result contain anything, which is not avalilable in Notepad already? What do you expect us to do?
I am not sure why, but when I typed this question, after the first line of data, there is a new line of data starting after "info_status." HEX CODE is the start of a new line, DECIMAL CODE is the start of a new line, ERROR NAME is the start of a new line and DESCRIPTION is the start of a new line.
This is what the data was supposed to look like:
05/12/2011 - 14:39:38.790329 - INFO_STATUS
HEX CODE 001c
DECIMAL CODE 28
ERROR NAME Overflow Error
DESCRIPTION Information overflow error occurred.
And then it's followed directly with no breaks in between by another error message, right?
Do you want the date as a string or a numeric serial date (precision up to milliseconds only)?

请先登录,再进行评论。

回答(2 个)

fid = fopen('test.txt');
out = textscan(fid,'%f/%f/%f - %f:%f:%f - %s','Delimiter','','CollectOutput',1);
time = datenum(out{1}(:,[3:-1:1 4:end]));
info = regexpi(out{2},'CODE (\w+)[\w\s]+NAME ([\w\s]+) DESCRIPTION ([\w\s]+)','tokens');
info = cat(1,info{:}); info = cat(1,info{:});
fid = fclose(fid);
If you need the time to remain in string format replace then:
fid = fopen('test.txt');
out = textscan(fid,'%10s - %15s - %s','Delimiter','','CollectOutput',1);
out{1}(:,3) = regexpi(out{1}(:,3),'CODE (\w+)[\w\s]+NAME ([\w\s]+) DESCRIPTION ([\w\s]+)','tokens');
out{1}(:,3) = cat(1,out{1}{:,3}); out{1}(:,3:5) = cat(1,out{1}{:,3});
fid = fclose(fid);
EDIT
fid = fopen('test.txt');
fmt = '%f/%f/%f-%f:%f:%f%*[^\n]\n HEX CODE %s\r\n %*[^\n]\n ERROR NAME %s\r\n DESCRIPTION%s';
out = textscan(fid,fmt,'Delimiter','','CollectOutput',1);
out{1} = datenum(out{1}(:,[3:-1:1 4:end]));
fid = fclose(fid);
If it is a large enough file or performance is important, use perl()

类别

帮助中心File Exchange 中查找有关 Text Data Preparation 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by