extract values in text document

1 次查看(过去 30 天)
Sebastian
Sebastian 2012-4-12
Hi all,
I'd like to screen a text-dokument (with numeric values and character-strings in a header section which is repeated unfrequently (not periodicaly after exact N rows); and numeric values in form of a matrix underneath the header) and collect all values after a certain string.
to be more clear, here an example of the textfile I want to process:
ITEM: TIMESTEP
1
ITEM: NUMBER OF ATOMS
1000
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
[1......
2......
.......
999....
1000...]
ITEM: TIMESTEP
2
ITEM: NUMBER OF ATOMS
1005
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
[1......
2......
.......
1004...
1005...]
... and so on...
I'd like to extract the number of atoms within different timesteps, which means: I want to create an array, which stores all the values, that follow the string
"ITEM: NUMBER OF ATOMS"
in the text document (in the example it's the values 1000 and 1005).
How can I do that?
Thanks very much for your help! regards
Sebastian

回答(1 个)

Ken Atwell
Ken Atwell 2012-4-12
For a customer file type like this, I would use a regular expression (the MATLAB function regexp) to scan the file. regexp can be a little daunting to the uninitiated, so here is a little code to get you started.
%%Read the data file
f = fopen('atomdata.txt');
t = fread(f, 'char=>char');
t=t';
fclose (f);
%%Scan for atom counts
numAtoms = regexp(t, 'ITEM: NUMBER OF ATOMS\W+([0-9]+)', 'tokens')
This will give you a cell array of text strings, which you may need to further convert to double using str2double or similar.
  2 个评论
Sebastian
Sebastian 2012-4-12
Hm, that sounds very complicated.
I think there should be a easier solution.
Give me one more try to explain. My problem is not the logic of the method to process the textfile to recieve the values after the string "ITEM: NUMBER OF ATOMS". My problem is more how to deal with the text file format...
That is a textfile-example:
ITEM: TIMESTEP
10
ITEM: NUMBER OF ATOMS
3
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
ITEM: TIMESTEP
20
ITEM: NUMBER OF ATOMS
5
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
ITEM: TIMESTEP
30
ITEM: NUMBER OF ATOMS
4
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
ITEM: TIMESTEP
40
ITEM: NUMBER OF ATOMS
7
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
ITEM: TIMESTEP
50
ITEM: NUMBER OF ATOMS
2
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
To create an array, that returns the values
array = [3,5,4,7,2]
To do that, I think I need some kind of method, which at first picks the value in line 4 (which is 3). With that value one can calculate, that after 4lines of the beginning +8lines of headers + 3 lines of values = line 15 => the next value is 5 and so on...
i need some kind of following method:
array = [0 ];
for i=1:1:5
array = [array; (value of line(4+(i-1)*8+sum(array)))]
end
ok, but how should I process that textfile?
I think I could do it with a lot of dlmread-commands but that would be very costly if the files become very large...
do you have another hint for me?
thanks and kind regards,
Sebastian
Ken Atwell
Ken Atwell 2012-4-12
You can use fgetl in a loop to read the file line-by-line, looking for "NUMBER OF ATOMS"'... knowing that the following line is the piece of data you are looking for.
I still contend that regexp will get you what you're looking for, probably in one line of code and certainly without a loop.

请先登录,再进行评论。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by