How do I count the total number 'AA' repeats in the text? I managed to count how many times it appears in each cell, but don't know how to add it up.

4 次查看(过去 30 天)
clear all;
%read SeqDNA file
Seq_Out_File='./Seq_Out.txt';
fileID=fopen(Seq_Out_File);
C=textscan(fileID, '%s');
NAA=(strfind(C{1},'AA'));
x=cellfun('length', NAA);

采纳的回答

Walter Roberson
Walter Roberson 2018-5-3
S = fileread(Seq_Out_File);
no_overlap_count = length(regexp(S, 'AA'));
with_overlap_count = length(regexp(S, 'A(?=A)'));
  3 个评论

请先登录,再进行评论。

更多回答(1 个)

John BG
John BG 2018-5-4
编辑:John BG 2018-5-4
Hi Nitzan Khan
1.-
According to
AAA counts as 2x AA and AAAA would count as 3x AA.
.
2.-
Also, you asked for AA match but you may want to really count all possible outcomes, all possible pairs of the basic sequence 'ACTG'
A basic equivalent to the Stack Overflow code in MATLAB would be:
A1='CTACTGCGACTTATGCCCATAATTGGCCACAATAAGTTTCTCGGATTCGCAGGTACCCTCGAGAGTATGGTCGTGGACTCAACCTTAGAGGCAACGGAGT'
L1='ACTG'
nL=combinator(4,2) % SChwarz's legendary function available here:
L2=L1(nL)
cell1={}
for k=1:1:size(L2,1)
cell1=[cell1 L2(k,:)]
end
nRep=[];
for k=1:1:size(L2,1)
[t1,t2]=regexp(A1,L2(k,:))
nRep=[nRep numel(t1)]
end
for k=1:1:size(L2,1)
str1=['n' L2(k,:) '=' num2str(nRep(k))]
evalin('base',str1)
end
L3=[repmat('n',size(L2,1),1) L2 repmat(',',size(L2,1),1)]'
L3=L3(:)'
L3(end)=[]
str3=['T1=table(' L3 ')']
evalin('base',str3)
T1 =
1×16 table
nAA nAC nAT nAG nCA nCC nCT nCG nTA nTC nTT nTG nGA nGC nGT nGG
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
5 7 6 7 6 4 7 6 7 6 5 5 7 5 6 7
.
3.-
For a complete sequence in a text file:
clear all;clc;close all
A=fileread('Seq_Out.txt');
L1='ACTG'
nL=combinator(4,2) % SChwarz's legendary function available here:
L2=L1(nL)
cell1={}
for k=1:1:size(L2,1)
cell1=[cell1 L2(k,:)]
end
nRep=[];
for k=1:1:size(L2,1)
[t1,t2]=regexp(A,L2(k,:));
nRep=[nRep numel(t1)];
end
for k=1:1:size(L2,1)
str1=['n' L2(k,:) '=' num2str(nRep(k))]
evalin('base',str1)
end
L32=[repmat('n',size(L2,1),1) L2 repmat(',',size(L2,1),1)]'
L32=L32(:)'
L32(end)=[]
.
the count table for all pairs is:
.
str3=['T2=table(' L32 ')']
T2 =
1×16 table
nAA nAC nAT nAG nCA nCC nCT nCG nTA nTC nTT nTG nGA nGC nGT nGG
_____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____
48684 62452 62140 62799 62493 48543 62702 62323 62328 62690 48502 62482 62491 62410 62683 48427
.
Besides the sequence used, also attached the saved variables in .mat file.
You can add more sequences, one each row, as show in the link above
.
If you find this answer useful would you please be so kind to consider marking my answer as Accepted Answer?
To any other reader, if you find this answer useful please consider clicking on the thumbs-up vote link
thanks in advance for time and attention
John BG

类别

Help CenterFile Exchange 中查找有关 Creating and Concatenating Matrices 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by