Main Content

nt2aa

Convert nucleotide sequence to amino acid sequence

Syntax

SeqAA = nt2aa(SeqNT)
SeqAA = nt2aa(..., 'Frame', FrameValue, ...)
SeqAA = nt2aa(..., 'GeneticCode', GeneticCodeValue, ...)
SeqAA = nt2aa(..., 'AlternativeStartCodons', AlternativeStartCodonsValue, ...)
SeqAA = nt2aa(..., 'ACGTOnly', ACGTOnlyValue, ...)

Input Arguments

SeqNT

One of the following:

Note

Hyphens are valid only if the codon to which it belongs represents a gap, that is, the codon contains all hyphens. Example: ACT---TGA

Tip

Do not use a sequence with hyphens if you specify 'all' for FrameValue.

FrameValue

Integer, character vector, or string specifying a reading frame in the nucleotide sequence. Choices are 1, 2, 3, or 'all'. Default is 1.

If FrameValue is 'all', then SeqAA is a 3-by-1 cell array.

GeneticCodeValue

Integer, character vector, or string specifying a genetic code number or code name from the table Genetic Code. Default is 1 or 'Standard'.

Tip

If you use a code name, you can truncate the name to the first two letters of the name.

AlternativeStartCodonsValue

Controls the translation of alternative codons. Choices are true or false (default).

ACGTOnlyValue

Controls the behavior of ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, and N) and unknown characters. ACGTOnlyValue can be true (default) or false.

  • If true, then the function errors if any of these characters are present.

  • If false, then the function tries to resolve ambiguities. If it cannot, it returns X for the affected codon.

Output Arguments

SeqAAAmino acid sequence specified by a character vector of single-letter codes.

Description

SeqAA = nt2aa(SeqNT) converts a nucleotide sequence, specified by SeqNT, to an amino acid sequence, returned in SeqAA, using the standard genetic code.

SeqAA = nt2aa(SeqNT, ...'PropertyName', PropertyValue, ...) calls nt2aa with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

SeqAA = nt2aa(..., 'Frame', FrameValue, ...) converts a nucleotide sequence for a specific reading frame to an amino acid sequence. Choices are 1, 2, 3, or 'all'. Default is 1. If FrameValue is 'all', then output SeqAA is a 3-by-1 cell array.

SeqAA = nt2aa(..., 'GeneticCode', GeneticCodeValue, ...) specifies a genetic code to use when converting a nucleotide sequence to an amino acid sequence. GeneticCodeValue can be an integer, character vector, or string specifying a code number or code name from the table Genetic Code. Default is 1 or 'Standard'. The amino acid to nucleotide codon mapping for the Standard genetic code is shown in the table Standard Genetic Code.

Tip

If you use a code name, you can truncate the name to the first two letters of the name.

SeqAA = nt2aa(..., 'AlternativeStartCodons', AlternativeStartCodonsValue, ...) controls the translation of alternative start codons.

When this option is true and the first codon of a sequence corresponds to a known alternative start codon, the function translates the codon to methionine. If this option is false, the function translates an alternative start codon at the start of a sequence to its corresponding amino acid in the genetic code that you specify, which might not necessarily be methionine. For example, in the human mitochondrial genetic code, AUA and AUU are known to be alternative start codons. For more information on alternative start codons, visit https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=t#SG1.

For more information about alternative start codons, see:

Genetic Code

Code NumberCode Name
1Standard
2Vertebrate Mitochondrial
3Yeast Mitochondrial
4Mold, Protozoan, Coelenterate Mitochondrial, and Mycoplasma/Spiroplasma
5Invertebrate Mitochondrial
6Ciliate, Dasycladacean, and Hexamita Nuclear
9Echinoderm Mitochondrial
10Euplotid Nuclear
11Bacterial and Plant Plastid
12Alternative Yeast Nuclear
13Ascidian Mitochondrial
14Flatworm Mitochondrial
15Blepharisma Nuclear
16Chlorophycean Mitochondrial
21Trematode Mitochondrial
22Scenedesmus Obliquus Mitochondrial
23Thraustochytrium Mitochondrial

Standard Genetic Code

Amino Acid NameAmino Acid CodeNucleotide Codon
Alanine AGCT GCC GCA GCG
ArginineRCGT CGC CGA CGG AGA AGG
AsparagineNAAT AAC
Aspartic acid (Aspartate) DGAT GAC
CysteineCTGT TGC
GlutamineQCAA CAG
Glutamic acid (Glutamate) EGAA GAG
GlycineGGGT GGC GGA GGG
HistidineHCAT CAC
IsoleucineIATT ATC ATA
LeucineL

TTA TTG† CTT CTC CTA CTG†

† indicates alternative start codon for the Standard Genetic Code as defined here. If you are using nt2aa, alternative start codons are converted to methionine (M) by default when one of these codons are the first codon of a sequence. To change this default behavior, set the AlternativeStartCodons name-value argument of nt2aa to false.

LysineKAAA AAG
MethionineMATG
PhenylalanineFTTT TTC
Proline PCCT CCC CCA CCG
SerineSTCT TCC TCA TCG AGT AGC
ThreonineTACT ACC ACA ACG
TryptophanWTGG
TyrosineYTAT TAC
ValineVGTT GTC GTA GTG
Asparagine or Aspartic acid (Aspartate) B Random codon from D and N
Glutamine or Glutamic acid (Glutamate) ZRandom codon from E and Q
Unknown amino acid (any amino acid) XRandom codon
Translation stop *TAA TAG TGA
Gap of indeterminate length ----
Unknown character (any character or symbol not in table) ????

SeqAA = nt2aa(..., 'ACGTOnly', ACGTOnlyValue, ...) controls the behavior of ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, and N) and unknown characters. ACGTOnlyValue can be true (default) or false. If true, then the function errors if any of these characters are present. If false, then the function tries to resolve ambiguities. If it cannot, it returns X for the affected codon.

Examples

Example 28. Converting the ND1 Gene
  1. Use the getgenbank function to retrieve genomic information for the human mitochondrion from the GenBank® database and store it in a MATLAB structure.

    mitochondria = getgenbank('NC_012920')
    
    mitochondria = 
    
                    LocusName: 'NC_012920'
          LocusSequenceLength: '16569'
         LocusNumberofStrands: ''
                LocusTopology: 'circular'
            LocusMoleculeType: 'DNA'
         LocusGenBankDivision: 'PRI'
        LocusModificationDate: '05-MAR-2010'
                   Definition: 'Homo sapiens mitochondrion, complete genome.'
                    Accession: 'NC_012920 AC_000021'
                      Version: 'NC_012920.1'
                           GI: '251831106'
                      Project: []
                       DBLink: 'Project:30353'
                     Keywords: []
                      Segment: []
                       Source: 'mitochondrion Homo sapiens (human)'
               SourceOrganism: [4x65 char]
                    Reference: {1x7 cell}
                      Comment: [24x67 char]
                     Features: [933x74 char]
                          CDS: [1x13 struct]
                     Sequence: [1x16569 char]
                    SearchURL: [1x70 char]
                  RetrieveURL: [1x104 char]
  2. Determine the name and location of the first gene in the human mitochondrion.

    mitochondria.CDS(1).gene
    
    ans =
    
    ND1
    mitochondria.CDS(1).location
    ans =
    
    3307..4262
  3. Extract the sequence for the ND1 gene from the nucleotide sequence.

    ND1gene = mitochondria.Sequence(3307:4262);
    
  4. Convert the ND1 gene on the human mitochondria genome to an amino acid sequence using the Vertebrate Mitochondrial genetic code.

    protein1 = nt2aa(ND1gene,'GeneticCode', 2);
    
  5. Use the getgenpept function to retrieve the same amino acid sequence from the GenPept database.

    protein2 = getgenpept('YP_003024026', 'SequenceOnly', true);
    
  6. Use the isequal function to compare the two amino acid sequences.

    isequal (protein1, protein2)
    
    ans =
    
         1
Example 29. Converting the ND2 Gene
  1. Use the getgenbank function to retrieve the nucleotide sequence for the human mitochondrion from the GenBank database.

    mitochondria = getgenbank('NC_012920');
    
  2. Determine the name and location of the second gene in the human mitochondrion.

    mitochondria.CDS(2).gene
    
    ans =
    
    ND2
    mitochondria.CDS(2).location
    ans =
    
    4470..5511
  3. Extract the sequence for the ND2 gene from the nucleotide sequence.

    ND2gene = mitochondria.Sequence(4470:5511);
    
  4. Convert the ND2 gene on the human mitochondria genome to an amino acid sequence using the Vertebrate Mitochondrial genetic code.

    protein1 = nt2aa(ND2gene,'GeneticCode', 2);
    

    Note

    In the ND2gene nucleotide sequence, the first codon is ATT, which is translated to M, while the subsequent ATT codons are translated to I. If you set 'AlternativeStartCodons' to false, then the first ATT codon is translated to I, the corresponding amino acid in the Vertebrate Mitochondrial genetic code.

  5. Use the getgenpept function to retrieve the same amino acid sequence from the GenPept database.

    protein2 = getgenpept('YP_003024027', 'SequenceOnly', true);
    
  6. Use the isequal function to compare the two amino acid sequences.

    isequal (protein1, protein2)
    
    ans =
    
         1
Example 30. Converting a Sequence with Ambiguous Characters

If you have a sequence with ambiguous or unknown nucleotide characters, you can set the 'ACGTOnly' property to false to have the nt2aa function try to resolve them:

nt2aa('agttgccgacgcgcncar','ACGTOnly', false)

ans =

SCRRAQ

Version History

Introduced before R2006a