searching a given line in a text file

2 次查看(过去 30 天)
The following file is a txt file in sdf format(chemical structures) It looks sumthin lik this
7 9 1 0 0 0 0
7 14 1 0 0 0 0
8 10 1 0 0 0 0
8 15 1 0 0 0 0
9 10 2 0 0 0 0
9 16 1 0 0 0 0
10 17 1 0 0 0 0
12 13 1 0 0 0 0
13 18 1 0 0 0 0
13 19 1 0 0 0 0
13 20 1 0 0 0 0
M END
> <PUBCHEM_COMPOUND_CID>
2244
> <PUBCHEM_COMPOUND_CANONICALIZED>
1
> <PUBCHEM_CACTVS_COMPLEXITY>
212
I need to extract just the information under the CID number field and there could be multiple CID number fields in a single file.. How should I go about this?? Any help would be appreciated..

采纳的回答

Ram
Ram 2011-3-1
I tried sumthin lik this
[A,B]=uigetfile('*.sdf','sdf');
C=fopen(A,'r');
n=0;
i=<ui>; %number of structures -- wil be obtained from the user
pubchem_id=[];
z=<ui>*300; %rough approximation-- 300lines for each structure
for j=1:1:z
D=fgetl(C);
if strcmp('> <PUBCHEM_COMPOUND_CID>',D)
E=fgetl(C);
E = str2double(E);
pubchem_id=[pubchem_id; E]
end
end
and it worked :)
  2 个评论
David Young
David Young 2011-3-1
The for loop that looks at 300 lines only is a hostage to fortune: what if there are more than 300 lines for a structure? You could avoid this by using a while loop that kept looking until it either found a particular line, or came to the end of the file, and that would be far more robust.
Ram
Ram 2011-3-4
I din use while loop because there is no such thing in an sdf that marks the end of the file.. lik for instance $$$$ marks the end of each structure and there could be multiple $$$$'s depending on the number of structures.. a structure averagely has about 180 lines so 300 is actually redundant and when thr are more 300 lines it wil be compensated by the ones that have less than 300..

请先登录,再进行评论。

更多回答(1 个)

Walter Roberson
Walter Roberson 2011-2-28
Not much you can do except fgetl() through the file until you encounter the M END line, and do the extraction work from there. The ease of extracting after that would depend upon the regularity of the data after that and upon which fields you were interested in.
  1 个评论
Ram
Ram 2011-3-1
thank u so much:) i have built my code based on ur reply only :)

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Workspace Variables and MAT Files 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by