How to read specific lines from a text file and store them in an array?
1 次查看(过去 30 天)
显示 更早的评论
I have a text file containing an Multiple Sequence Alignment (MSA) which has protein sequences stored in it. The contents of the file is like this:
>gi|73961569|ref|XP_547536.2| osteocalcin [C. lupus familiaris]
MRSLMVLALLAVAALCLCLAGPADAKPSSAESRKGGATFVSKREGSEVVRRLRRYLDSGL
GAPVPYPDPLEPKREVCELNPNCDELADHIGFQEAYQRFYGPV-
>gi|27806301|ref|NP_776674.1| osteocalcin preproprotein
MRTPMLLALLALAT--LCLAGRADAKPGDAESGK-GAAFVSKQEGSEVVKRLRRYLDHWL
GAPAPYPDPLEPKREVCELNPDCDELADHIGFQEAYRRFYGPV-
From this file I just want to extract the lines containing the actual sequences (ones NOT starting with '>' symbol) and store them in an array for future use. One thing to mention is that line 2 and line 3 is one single sequence, so I also need to make them a single string and store it in one single position of an array. How can I do that?
I wanted to use 'fileread' but it reads all the file at a time, so it's not helpful.
3 个评论
TastyPastry
2015-10-20
编辑:TastyPastry
2015-10-20
To clarify, are your sequences supposed to be formatted like this, where there are two separate sequences starting with >gi?
>gi|73961569|ref|XP_547536.2| osteocalcin [C. lupus familiaris]
MRSLMVLALLAVAALCLCLAGPADAKPSSAESRKGGATFVSKREGSEVVRRLRRYLDSGL
GAPVPYPDPLEPKREVCELNPNCDELADHIGFQEAYQRFYGPV-
>gi|27806301|ref|NP_776674.1| osteocalcin preproprotein
MRTPMLLALLALAT--LCLAGRADAKPGDAESGK-GAAFVSKQEGSEVVKRLRRYLDHWL
GAPAPYPDPLEPKREVCELNPDCDELADHIGFQEAYRRFYGPV-
回答(1 个)
per isakson
2015-10-20
Try
>> out = cssm
out =
[1x104 char] [1x104 char] [1x104 char] [1x104 char] [1x104 char] [1x104 char]
>> out{3}
ans =
MRSLMVLALLAVAALCLCLAGPADAKPSSAESRKGGATFVSKREGSEVVRRLRRYLDSGLGAPVPYPDPLEPKREVCELNPNCDELADHIGFQEAYQRFYGPV-
>>
where
function out = cssm
str = fileread( 'cssm.txt' );
cac = regexp( str, '(?<=>gi[^\n]+\n).+?(?=\n>gi|$)', 'match' );
out = cell(1,length(cac));
for jj = 1 : length( cac )
out{jj} = regexprep( cac{jj}, '\n', '' );
end
end
and cssm.txt contains three copies of the string of your question.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Characters and Strings 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!