extract values in text document

Question

Sebastian 2012-4-12

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/35268-extract-values-in-text-document

Hi all,

I'd like to screen a text-dokument (with numeric values and character-strings in a header section which is repeated unfrequently (not periodicaly after exact N rows); and numeric values in form of a matrix underneath the header) and collect all values after a certain string.

to be more clear, here an example of the textfile I want to process:

ITEM: TIMESTEP
1
ITEM: NUMBER OF ATOMS
1000
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
[1......
 2......
 .......
 999....
 1000...]
ITEM: TIMESTEP
2
ITEM: NUMBER OF ATOMS
1005
ITEM: BOX BOUNDS
-1 1
-1 1
-0.1 2
[1......
 2......
 .......
 1004...
 1005...]

... and so on...

I'd like to extract the number of atoms within different timesteps, which means: I want to create an array, which stores all the values, that follow the string

"ITEM: NUMBER OF ATOMS"

in the text document (in the example it's the values 1000 and 1005).

How can I do that?

Thanks very much for your help! regards

Sebastian

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Ken Atwell 2012-4-12

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/35268-extract-values-in-text-document#answer_44188

在 MATLAB Online 中打开

For a customer file type like this, I would use a regular expression (the MATLAB function regexp) to scan the file. regexp can be a little daunting to the uninitiated, so here is a little code to get you started.

 %%Read the data file
 f = fopen('atomdata.txt');
 t = fread(f, 'char=>char');
 t=t';
 fclose (f);
 %%Scan for atom counts
 numAtoms = regexp(t, 'ITEM: NUMBER OF ATOMS\W+([0-9]+)', 'tokens')

This will give you a cell array of text strings, which you may need to further convert to double using str2double or similar.

2 个评论
显示无隐藏无

Sebastian 2012-4-12

Hm, that sounds very complicated.

I think there should be a easier solution.

Give me one more try to explain. My problem is not the logic of the method to process the textfile to recieve the values after the string "ITEM: NUMBER OF ATOMS". My problem is more how to deal with the text file format...

That is a textfile-example:

ITEM: TIMESTEP

10

ITEM: NUMBER OF ATOMS

3

ITEM: BOX BOUNDS

-1 1

-0.1 2

1 2 3 4 5 6 7 8 9

ITEM: TIMESTEP

20

ITEM: NUMBER OF ATOMS

5

ITEM: BOX BOUNDS

-1 1

-0.1 2

1 2 3 4 5 6 7 8 9

ITEM: TIMESTEP

30

ITEM: NUMBER OF ATOMS

4

ITEM: BOX BOUNDS

-1 1

-0.1 2

1 2 3 4 5 6 7 8 9

ITEM: TIMESTEP

40

ITEM: NUMBER OF ATOMS

7

ITEM: BOX BOUNDS

-1 1

-0.1 2

1 2 3 4 5 6 7 8 9

ITEM: TIMESTEP

50

ITEM: NUMBER OF ATOMS

2

ITEM: BOX BOUNDS

-1 1

-0.1 2

1 2 3 4 5 6 7 8 9

To create an array, that returns the values

array = [3,5,4,7,2]

To do that, I think I need some kind of method, which at first picks the value in line 4 (which is 3). With that value one can calculate, that after 4lines of the beginning +8lines of headers + 3 lines of values = line 15 => the next value is 5 and so on...

i need some kind of following method:

array = [0 ];

for i=1:1:5

array = [array; (value of line(4+(i-1)*8+sum(array)))]

end

ok, but how should I process that textfile?

I think I could do it with a lot of dlmread-commands but that would be very costly if the files become very large...

do you have another hint for me?

thanks and kind regards,

Sebastian

Ken Atwell 2012-4-12

You can use fgetl in a loop to read the file line-by-line, looking for "NUMBER OF ATOMS"'... knowing that the following line is the piece of data you are looking for.

I still contend that regexp will get you what you're looking for, probably in one line of code and certainly without a loop.

请先登录，再进行评论。

extract values in text document

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

2 个评论
显示无隐藏无

另请参阅

类别

标签

Community Treasure Hunt

extract values in text document

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

2 个评论 显示 无隐藏 无

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

2 个评论
显示无隐藏无