How to search nucleotide sequences with regexp?
1 次查看(过去 30 天)
显示 更早的评论
Hello everyone,
I am trying to search a huge list of 23 322 DNA sequences for this sequence:
XTTATTATTATTATTATTATTATTY
Where T and A are the usual bases, and I want X and Y to be A, C, T, or G, length 1. I am looking for this (TTA)7TT repeat core sequence and trying to find what are the bases immediately flanking it.
So I am using the regular expression:
[ACTG]{1,1}TTATTATTATTATTATTATTATT[ACTG]{1,1}
And I get 30 results. When I search for the flanking residues manually and sum up those results, using regular expressions like this:
ATTATTATTATTATTATTATTATTA
GTTATTATTATTATTATTATTATTG
CTTATTATTATTATTATTATTATTC
TTTATTATTATTATTATTATTATTT
and so on, I get 47 results. The first regular expression should be able to find all of the results in one go but apparently it does not. So I think I have made an error in constructing my first regular expression, because it is not finding all of the results. If there are any regular expression masters out there, I would greatly appreciate your help.
0 个评论
回答(0 个)
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!