Loop through the DNA array and record all of the locations of the triplets (codons): ‘AAA’, ‘ATC’ and ‘CGG’.

2 次查看(过去 30 天)
My code so far is functional, but I don't think that it's correct. I am supposed to loop through the cell array and record the locations of each codon, while skipping over the ones that contain a character from preceding codon. For example, if part of the sequence contains [A,T,C,C,G,G] then the section with CCG should be skipped. I'm just not entirely sure what the best way to do that would be.
Here is what I have so far:
fid = fopen('sequence_long.txt','r')
A = textscan(fid,'%3s');
DNA = A{1};
fclose(fid);
i = 1;
%loops through array and counts codon occurrences
%finds the index location of individual codons
while i < length(DNA)
i = i + 1;
if strcmp(DNA(i),'AAA')
num_AAA = nnz(strcmp(DNA,'AAA'));
loc_AAA = find(strcmp(DNA,'AAA'));
elseif strcmp(DNA(i),'ATC')
num_ATC = nnz(strcmp(DNA,'ATC'));
loc_ATC = find(strcmp(DNA,'ATC'));
elseif strcmp(DNA(i),'CGG')
num_CGG = nnz(strcmp(DNA,'CGG'));
loc_CGG = find(strcmp(DNA,'CGG'));
end
end
fprintf('The number of AAA values is: %.f',num_AAA)
fprintf('The index location of AAA values: %.f\n',loc_AAA(1:10))
fprintf('The number of ATC values is: %.f',num_ATC)
fprintf('The index location of ATC values: %.f\n',loc_ATC(1:10))
fprintf('The number of CGG values is: %.f',num_CGG)
fprintf('The index location of CGG values: %.f\n',loc_CGG(1:10))

采纳的回答

Sai Veeramachaneni
Sai Veeramachaneni 2020-11-17
编辑:Sai Veeramachaneni 2020-11-17
One workaround is to iterate over the sequence and skip the next two characters whenever we find a codon.
You can look at the below code for your reference.
DNA = 'AAATCATCGGCGGATC';%Example sequence
i = 1;
loc_AAA = [];
loc_ATC = [];
loc_CGG = [];
num_AAA = 0;
num_ATC = 0;
num_CGG = 0;
while i <= length(DNA)-2
if DNA(i)=='A' && DNA(i+1)=='A' && DNA(i+2)=='A'
loc_AAA = [loc_AAA i];
num_AAA = num_AAA + 1;
i = i + 3; %Skip the next two characters
elseif DNA(i)=='A' && DNA(i+1)=='T' && DNA(i+2)=='C'
loc_ATC = [loc_ATC i];
num_ATC = num_ATC + 1;
i = i + 3;
elseif DNA(i)=='C' && DNA(i+1)=='G' && DNA(i+2)=='G'
loc_CGG = [loc_CGG i];
num_CGG = num_CGG + 1;
i = i + 3;
else
i = i + 1;
end
end

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Genomics and Next Generation Sequencing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by