How to search nucleotide sequences with regexp?

1 次查看(过去 30 天)
Hello everyone,
I am trying to search a huge list of 23 322 DNA sequences for this sequence:
XTTATTATTATTATTATTATTATTY
Where T and A are the usual bases, and I want X and Y to be A, C, T, or G, length 1. I am looking for this (TTA)7TT repeat core sequence and trying to find what are the bases immediately flanking it.
So I am using the regular expression:
[ACTG]{1,1}TTATTATTATTATTATTATTATT[ACTG]{1,1}
And I get 30 results. When I search for the flanking residues manually and sum up those results, using regular expressions like this:
ATTATTATTATTATTATTATTATTA
GTTATTATTATTATTATTATTATTG
CTTATTATTATTATTATTATTATTC
TTTATTATTATTATTATTATTATTT
and so on, I get 47 results. The first regular expression should be able to find all of the results in one go but apparently it does not. So I think I have made an error in constructing my first regular expression, because it is not finding all of the results. If there are any regular expression masters out there, I would greatly appreciate your help.

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Get Started with MATLAB 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by