Main Content

basecount

Count nucleotides in sequence

Syntax

NTStruct = basecount(SeqNT)
NTStruct = basecount(SeqNT, ...'Ambiguous', AmbiguousValue, ...)
NTStruct = basecount(SeqNT, ...'Gaps', GapsValue, ...)
NTStruct = basecount(SeqNT, ...'Chart', ChartValue, ...)

Input Arguments

SeqNT

One of the following:

AmbiguousValue

Character vector or string specifying how to treat ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, or N). Choices are:

  • 'ignore' (default) — Skips ambiguous characters

  • 'bundle' — Counts ambiguous characters and reports the total count in the Ambiguous field.

  • 'prorate' — Counts ambiguous characters and distributes them proportionately in the appropriate fields. For example, the counts for the character R are distributed evenly between the A and G fields.

  • 'individual' — Counts ambiguous characters and reports them in individual fields.

  • 'warn' — Skips ambiguous characters and displays a warning.

GapsValue

Specifies whether gaps, indicated by a hyphen (-), are counted or ignored. Choices are true or false (default).

ChartValue

Character vector or string specifying a chart type. Choices are 'pie' or 'bar'.

Output Arguments

NTStruct1-by-1 MATLAB structure containing the fields A, C, G, and T.

Description

NTStruct = basecount(SeqNT) counts the number of each type of base in SeqNT, a nucleotide sequence, and returns the counts in NTStruct, a 1-by-1 MATLAB structure containing the fields A, C, G, and T.

  • The character U is added to the T field.

  • Ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, or N), and gaps, indicated by a hyphen (-), are ignored by default.

  • Unrecognized characters are ignored and cause the following warning message.

    Warning: Unknown symbols appear in the sequence. These will be ignored.

NTStruct = basecount(SeqNT, ...'PropertyName', PropertyValue, ...) calls basecount with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

NTStruct = basecount(SeqNT, ...'Ambiguous', AmbiguousValue, ...) specifies how to treat ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, or N). Choices are:

  • 'ignore' (default)

  • 'bundle'

  • 'prorate'

  • 'individual'

  • 'warn'

NTStruct = basecount(SeqNT, ...'Gaps', GapsValue, ...) specifies whether gaps, indicated by a hyphen (-), are counted or ignored. Choices are true or false (default).

NTStruct = basecount(SeqNT, ...'Chart', ChartValue, ...) creates a chart showing the relative proportions of the nucleotides. ChartValue can be 'pie' or 'bar'.

Examples

collapse all

Count the bases in a DNA sequence and return the results in a structure.

bases = basecount('TAGCTGGCCAAGCGAGCTTG')
bases = struct with fields:
    A: 4
    C: 5
    G: 7
    T: 4

Get the count for adenosine (A) bases.

bases.A
ans = 4

Count the bases in a DNA sequence containing ambiguous characters (R, Y, K, M, S, W, B, D, H, V, or N), listing each of them in a separate field.

basecount('ABCDGGCCAAGCGAGCTTG','Ambiguous','individual')
ans = struct with fields:
    A: 4
    C: 5
    G: 6
    T: 2
    R: 0
    Y: 0
    K: 0
    M: 0
    S: 0
    W: 0
    B: 1
    D: 1
    H: 0
    V: 0
    N: 0

Version History

Introduced before R2006a