How to search nucleotide sequences with regexp?

2 次查看(过去 30 天)
Hello everyone,
I am trying to search a huge list of 23 322 DNA sequences for this sequence:
XTTATTATTATTATTATTATTATTY
Where T and A are the usual bases, and I want X and Y to be A, C, T, or G, length 1. I am looking for this (TTA)7TT repeat core sequence and trying to find what are the bases immediately flanking it.
So I am using the regular expression:
[ACTG]{1,1}TTATTATTATTATTATTATTATT[ACTG]{1,1}
And I get 30 results. When I search for the flanking residues manually and sum up those results, using regular expressions like this:
ATTATTATTATTATTATTATTATTA
GTTATTATTATTATTATTATTATTG
CTTATTATTATTATTATTATTATTC
TTTATTATTATTATTATTATTATTT
and so on, I get 47 results. The first regular expression should be able to find all of the results in one go but apparently it does not. So I think I have made an error in constructing my first regular expression, because it is not finding all of the results. If there are any regular expression masters out there, I would greatly appreciate your help.

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Low-Level File I/O 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by