What format does the MSA data need to be in order to calculate pair-wise distances with seqpdist?

2 次查看(过去 30 天)
I am reading a clustalw text format msa with multialignread. I have tried splitting the msaread data into two cells, and keeping the structure in tact, neither method has been successful. Seqpdist does accept the sequence cell output of fastaread fasta text file of the same sequences. %This works... [heads,seqs]=fastaread('fastaformat.fasta'); distancematrix=seqpdist(seqs,'method',pam(250),'squareform',1); %This does not... [heads,seqs]=multialignread('clustalwmsa.aln1'); distancematrix=seqpdist(seqs,'method',pam(250),'squareform',1);
This is the error message:
??? Error using ==> cell.strmatch at 21
Requires character array or cell array of strings as inputs.
Error in ==> seqpdist at 258
distMethod = strmatch(lower(pval),distMethods);
Error in ==> cscalc at 14
dmat=seqpdist(seqs,'method',pam(250),'squareform',1);

采纳的回答

Walter Roberson
Walter Roberson 2011-6-11
What you pass for 'method' must be a string.
The reference to pam appears to be something appropriate for a 'ScoringMatrix' parameter and the parameter you would pass for that would be the string 'pam250'
  1 个评论
Adam Quintero
Adam Quintero 2011-6-11
that is absolutely correct. thank you, i am getting the semantics of this function wrong. this set seqpdist to find the pairwise distance matrix of the MSA using pam250 units.
thank you

请先登录,再进行评论。

更多回答(1 个)

Adam Quintero
Adam Quintero 2011-6-11
To calculate a scoring matrix from a MSA based on pam250 scoring, the input needs to be a cell array of the sequence strings. Multialignread formats the sequences from a clustalw msa file (*.aln1) into a sructure with headers and sequences, or separate cell arrays with each.
The reason seqpdist could not read the sequences is because of an incorrect use of its arguments. The 'method' argument is only used if the input sequences are not already aligned. By using the sequence data from multialignread, and trying to align it again with 'method' caused the error.
The correct argument to use in this case is 'scoringmethod', where the pre-aligned sequences are re-scored using the 'scoringmethod' value.
pamdistancematrix=seqpdist(sequence,...
'scoringmethod',pam250,'squareform',1)
  2 个评论
Adam Quintero
Adam Quintero 2011-6-11
Wow, sorry. That was NOT the correct answer. Walter Roberson is correct in that I should use 'scoringmatrix' instead of 'method', so that the input is handled as MSA sequences and not raw FASTA.
My apologies, Robert.

请先登录,再进行评论。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by