Store and Manage Feature Annotations in Objects
Represent Feature Annotations in a GFFAnnotation or GTFAnnotation Object
The GFFAnnotation
and GTFAnnotation
objects represent a collection of feature annotations for one or
more reference sequences. You construct these objects from GFF (General Feature Format) and
GTF (Gene Transfer Format) files. Each element in the object represents a single annotation.
The properties and methods associated with the objects let you investigate and filter the
data based on reference sequence, a feature (such as CDS or exon), or a specific gene or
transcript.
Construct an Annotation Object
Use the GFFAnnotation
constructor function to construct a
GFFAnnotation
object from either a GFF- or GTF-formatted file:
GFFAnnotObj = GFFAnnotation('tair8_1.gff')
GFFAnnotObj = GFFAnnotation with properties: FieldNames: {1x9 cell} NumEntries: 3331
Use the GTFAnnotation
constructor function to construct a
GTFAnnotation
object from a GTF-formatted file:
GTFAnnotObj = GTFAnnotation('hum37_2_1M.gtf')
GTFAnnotObj = GTFAnnotation with properties: FieldNames: {1x11 cell} NumEntries: 308
Retrieve General Information from an Annotation Object
Determine the field names and the number of entries in an annotation object by accessing
the FieldNames
and NumEntries
properties. For
example, to see the field names for each annotation object constructed in the previous
section, query the FieldNames
property:
GFFAnnotObj.FieldNames
ans = Columns 1 through 6 'Reference' 'Start' 'Stop' 'Feature' 'Source' 'Score' Columns 7 through 9 'Strand' 'Frame' 'Attributes'
GTFAnnotObj.FieldNames
ans = Columns 1 through 6 'Reference' 'Start' 'Stop' 'Feature' 'Gene' 'Transcript' Columns 7 through 11 'Source' 'Score' 'Strand' 'Frame' 'Attributes'
Determine the range of the reference sequences that are covered by feature annotations
by using the getRange
method with the annotation object constructed in
the previous section:
range = getRange(GFFAnnotObj)
range = 3631 498516
Access Data in an Annotation Object
Create a Structure of the Annotation Data
Creating a structure of the annotation data lets you access the field values. Use the
getData
method to create a structure containing a subset of the
data in a GFFAnnotation
object constructed in the previous section.
% Extract annotations for positions 1 through 10000 of the % reference sequence AnnotStruct = getData(GFFAnnotObj,1,10000)
AnnotStruct = 60x1 struct array with fields: Reference Start Stop Feature Source Score Strand Frame Attributes
Access Field Values in the Structure
Use dot indexing to access all or specific field values in a structure.
For example, extract the start positions for all annotations:
Starts = AnnotStruct.Start;
Extract the start positions for annotations 12 through 17. Notice that you must use square brackets when indexing a range of positions:
Starts_12_17 = [AnnotStruct(12:17).Start]
Starts_12_17 = 4706 5174 5174 5439 5439 5631
Extract the start position and the feature for the 12th annotation:
Start_12 = AnnotStruct(12).Start
Start_12 = 4706
Feature_12 = AnnotStruct(12).Feature
Feature_12 = CDS
Use Feature Annotations with Sequence Read Data
Investigate the results of HTS sequencing experiments by using
GFFAnnotation
and GTFAnnotation
objects with
BioMap
objects. For example, you can:
Determine counts of sequence reads aligned to regions of a reference sequence associated with specific annotations, such as in RNA-Seq workflows.
Find annotations within a specific range of a peak of interest in a reference sequence, such as in ChIP-Seq workflows.
Determine Annotations of Interest
Construct a
GTFAnnotation
object from a GTF- formatted file:GTFAnnotObj = GTFAnnotation('hum37_2_1M.gtf');
Use the
getReferenceNames
method to return the names for the reference sequences for the annotation object:refNames = getReferenceNames(GTFAnnotObj)
refNames = 'chr2'
Use the
getFeatureNames
method to retrieve the feature names from the annotation object:featureNames = getFeatureNames(GTFAnnotObj)
featureNames = 'CDS' 'exon' 'start_codon' 'stop_codon'
Use the
getGeneNames
method to retrieve a list of the unique gene names from the annotation object:geneNames = getGeneNames(GTFAnnotObj)
geneNames = 'uc002qvu.2' 'uc002qvv.2' 'uc002qvw.2' 'uc002qvx.2' 'uc002qvy.2' 'uc002qvz.2' 'uc002qwa.2' 'uc002qwb.2' 'uc002qwc.1' 'uc002qwd.2' 'uc002qwe.3' 'uc002qwf.2' 'uc002qwg.2' 'uc002qwh.2' 'uc002qwi.3' 'uc002qwk.2' 'uc002qwl.2' 'uc002qwm.1' 'uc002qwn.1' 'uc002qwo.1' 'uc002qwp.2' 'uc002qwq.2' 'uc010ewe.2' 'uc010ewf.1' 'uc010ewg.2' 'uc010ewh.1' 'uc010ewi.2' 'uc010yim.1'
The previous steps gave us a list of available reference sequences, features, and genes associated with the available annotations. Use this information to determine annotations of interest. For instance, you might be interested only in annotations that are exons associated with the uc002qvv.2 gene on chromosome 2.
Filter Annotations
Use the getData
method to filter the annotations and create a
structure containing only the annotations of interest, which are annotations that are
exons associated with the uc002qvv.2 gene on chromosome 2.
AnnotStruct = getData(GTFAnnotObj,'Reference','chr2',... 'Feature','exon','Gene','uc002qvv.2')
AnnotStruct = 12x1 struct array with fields: Reference Start Stop Feature Gene Transcript Source Score Strand Frame Attributes
The return structure contains 12 elements, indicating there are 12 annotations that meet your filter criteria.
Extract Position Ranges for Annotations of Interest
After filtering the data to include only annotations that are exons associated with the uc002qvv.2 gene on chromosome 2, use the Start and Stop fields to create vectors of the start and end positions for the ranges associated with the 12 annotations.
StartPos = [AnnotStruct.Start]; EndPos = [AnnotStruct.Stop];
Determine Counts of Sequence Reads Aligned to Annotations
Construct a BioMap
object from a BAM-formatted file containing
sequence read data aligned to chromosome 2.
BMObj3 = BioMap('ex3.bam');
Then use the range for the annotations of interest as input to the
getCounts
method of a BioMap
object. This
returns the counts of short reads aligned to the annotations of interest.
counts = getCounts(BMObj3,StartPos,EndPos,'independent', true)
counts = 1399 1 54 221 97 125 0 1 0 65 9 12