How can I save the beginning and end positions of each sequence in a cell array?

8 次查看(过去 30 天)
So I am looping through codons and recording them on a .txt file. The script works, but I need the sequence to begin at the starting codon position, stop at the end codon then continue through the cell array while recording all of the following start and end codon sequences. I would just like to know the best option I can use to tweak my code here. Thanks in advance!
fid = fopen("sequence_long2.txt",'r');
C = textscan(fid,'%3s');
x = C{1}
fclose(fid);
%Start sequence
ss = 1;
% end sequence
es = 183479;
seq_id = long_codon(x(ss:es));
function seq = long_codon(v)
seq = (v);
for pos = 1:length(seq)
if strcmp(seq{pos},'TAC')
index = find(strcmp(v,seq{pos}));
StartPos = index;
elseif (strcmp(seq{pos},'ACT') || strcmp(seq{pos},'ATT') || strcmp(seq{pos},'ATC'))
index = find(strcmp(v,seq{pos}));
EndPos = index;
end
end
fid2 = fopen('report_long.txt','w+');
fprintf(fid2,'Name: OP \n');
fprintf(fid2,'Lab 13: DNA Pattern Matching\n \n');
fprintf(fid2,'Start Position of Gene is: %d \n',StartPos);
fprintf(fid2, 'End Position of Gene is: %d \n',EndPos);
fclose(fid2);
end
  14 个评论
Rik
Rik 2020-11-28
I would urge you to change to strfind first. Then you can loop through all start codons, removing later start codons if they are inside the gene being read.
Austin Shipley
Austin Shipley 2020-11-28
编辑:Austin Shipley 2020-11-28
So I have been trying to use strfind, but I am still having this issue where my end codon positions are not being recorded correctly. Do I need to nest another while loop or am I just not using strfind properly?
fid = fopen("sequence_long2.txt",'r');
C = textscan(fid,'%s');
x = C{1};
fclose(fid);
x_conv = char(x);
Start_loc = [];
End_loc = [];
flag = 0;
i = 1;
while i<(numel(x_conv)-2)
if (strcmp(x_conv(i+[0 1 2]),'TAC')) && flag == 0
Start_loc = strfind(x_conv,'TAC');
i = i + 3;
flag = flag + 1;
elseif ismember(x_conv(i+[0 1 2]),{'ATC','ACT','ATT'}) && flag == 1
End_loc = [End_loc i];
i = i + 3;
flag = flag - 1;
else
i = i+1;
end
end
fid2 = fopen('report_long.txt','w+');
fprintf(fid2,'Name: Austin \n');
fprintf(fid2,'Lab 13: DNA Pattern Matching\n \n');
fprintf(fid2,'Start Position of Gene is: %d End Position of Gene is: %d\n ',Start_loc,End_loc);
fclose(fid2);

请先登录,再进行评论。

采纳的回答

Rik
Rik 2020-11-29
%Since your code is working fine you can keep it as is.
%I just used my own function to use your data.
x_conv=readfile('https://www.mathworks.com/matlabcentral/answers/uploaded_files/430218/sequence_long2.txt');
x_conv=x_conv{1};
%find all possible start codons and stop codons
Start_loc = strfind(x_conv,'TAC');
End_loc = cellfun(@(stopcodon)strfind(x_conv,stopcodon),{'ATC','ACT','ATT'},'UniformOutput',false);
End_loc = horzcat(End_loc{:});
n=0;
while n<numel(Start_loc)
n=n+1;
this_start=Start_loc(n);
%select all possible end codons
this_end=End_loc(End_loc>this_start);
%figure out which is the first end codon with an offset of 3
this_end=this_end(mod(this_end-this_start,3)==0);
this_end=this_end(1);
%now we need to remove elements in Start_loc that in the current gene
Start_loc(Start_loc>this_start & Start_loc<this_end)=[];
%store the end as well
End_loc(n)=this_end;
end
%remove extra values in End_loc
End_loc((n+1):end)=[];
genes=cell(size(End_loc));
for n=1:numel(End_loc)
genes{n}=x_conv(Start_loc(n):End_loc(n));
end

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Graph and Network Algorithms 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by