Main Content

getSubset

Retrieve subset of elements from GTFAnnotation or GFFAnnotation object

Description

NewObj = getSubset(AnnotObj,StartPos,EndPos) returns NewObj, a new object containing a subset of the elements from AnnotObj that falls within each reference sequence range specified by StartPos and EndPos.

NewObj = getSubset(AnnotObj,Subset) returns NewObj, a new object containing a subset of elements specified by Subset, a vector of integers.

example

NewObj = getSubset(___,Name,Value) returns NewObj, a new object containing a subset of the elements from AnnotObj, using any of the input arguments from the previous syntaxes and additional options specified by one or more Name,Value pair arguments.

example

Examples

collapse all

Construct a GTFAnnotation object using a GTF-formatted file that is provided with Bioinformatics Toolbox™.

GTFAnnotObj = GTFAnnotation('hum37_2_1M.gtf');

Retrieve a subset of data from the first to fifth elements of GTFAnnotObj.

subsetGTF1 = getSubset(GTFAnnotObj,1:5)
subsetGTF1 = 
  GTFAnnotation with properties:

    FieldNames: {'Reference'  'Start'  'Stop'  'Feature'  'Gene'  'Transcript'  'Source'  'Score'  'Strand'  'Frame'  'Attributes'}
    NumEntries: 5

Retrieve only the first, fifth and eighth elements of GTFAnnotObj.

subsetGTF2 = getSubset(GTFAnnotObj,[1 5 8])
subsetGTF2 = 
  GTFAnnotation with properties:

    FieldNames: {'Reference'  'Start'  'Stop'  'Feature'  'Gene'  'Transcript'  'Source'  'Score'  'Strand'  'Frame'  'Attributes'}
    NumEntries: 3

Construct a GTFAnnotation object using a GTF-formatted file that is provided with Bioinformatics Toolbox™.

GTFAnnotObj = GTFAnnotation('hum37_2_1M.gtf');

Create a subset of the data containing only CDS features.

subsetGTF = getSubset(GTFAnnotObj,"Feature","CDS")
subsetGTF = 
  GTFAnnotation with properties:

    FieldNames: {'Reference'  'Start'  'Stop'  'Feature'  'Gene'  'Transcript'  'Source'  'Score'  'Strand'  'Frame'  'Attributes'}
    NumEntries: 92

Construct a GFFAnnotation object using a GFF-formatted file that is provided with Bioinformatics Toolbox™.

GFFAnnotObj = GFFAnnotation('tair8_1.gff');

Retrieve a subset of data from the first to fifth elements of GFFAnnotObj.

subsetGFF2 = getSubset(GFFAnnotObj,1:5)
subsetGFF2 = 
  GFFAnnotation with properties:

    FieldNames: {'Reference'  'Start'  'Stop'  'Feature'  'Source'  'Score'  'Strand'  'Frame'  'Attributes'}
    NumEntries: 5

Retrieve only the first, fifth, and eighth elements of GFFAnnotObj.

subsetGFF3 = getSubset(GFFAnnotObj,[1 5 8])
subsetGFF3 = 
  GFFAnnotation with properties:

    FieldNames: {'Reference'  'Start'  'Stop'  'Feature'  'Source'  'Score'  'Strand'  'Frame'  'Attributes'}
    NumEntries: 3

Construct a GFFAnnotation object using a GFF-formatted file that is provided with Bioinformatics Toolbox™.

GFFAnnotObj = GFFAnnotation('tair8_1.gff');

Create a subset of data containing only protein features.

subsetGFF1 = getSubset(GFFAnnotObj,"Feature","protein")
subsetGFF1 = 
  GFFAnnotation with properties:

    FieldNames: {'Reference'  'Start'  'Stop'  'Feature'  'Source'  'Score'  'Strand'  'Frame'  'Attributes'}
    NumEntries: 200

Input Arguments

collapse all

Feature annotations, specified as a GTFAnnotation or GFFAnnotation object.

Start of a range in each reference sequence in AnnotObj, specified as a nonnegative integer less than or equal to EndPos.

Data Types: double

End of a range in each reference sequence in AnnotObj, specified as a nonnegative integer greater than or equal to StartPos.

Data Types: double

Subset of data from AnnotObj to retrieve, specified as a vector of positive integers. Each integer must be less than or equal to the number of entries in the object.

Data Types: double

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: NewObj = getSubset(AnnotObj,"Feature","CDS")

One or more reference sequences in AnnotObj, specified as a character vector, string, string vector, or cell array of character vectors. Only annotations whose reference field matches one of the character vectors or strings are included in NewObj.

Data Types: char | string | cell

One or more features in AnnotObj, specified as a character vector, string, string vector, or cell array of character vectors. Only annotations whose feature field matches one of the character vectors or strings are included in NewObj.

Data Types: char | string | cell

One or more genes in AnnotObj of type GTFAnnotation, specified as a character vector, string, string vector, or cell array of character vectors. Only annotations whose gene field matches one of the character vectors or strings are included in NewObj.

Data Types: char | string | cell

One or more transcripts in AnnotObj of type GTFAnnotation, specified as a character vector, string, string vector, or cell array of character vectors. Only annotations whose transcript field matches one of the character vectors or strings are included in NewObj.

Data Types: char | string | cell

Minimum number of base positions that annotation must overlap in the range, to be included in NewObj, specified as a positive integer, "full" or "start". Use "full" when an annotation must be fully contained in the range to be included. Use "start" when an annotation’s start position must lie within the range to be included.

Data Types: double | char | string

Output Arguments

collapse all

Subset of feature annotations, returned as a GTFAnnotation or GFFAnnotation object.

Tips

  • The getSubset function selects annotations from the range specified by StartPos and EndPos for each reference sequence in AnnotObj unless you use the Reference name-value pair argument to limit the reference sequences.

  • After creating a subsetted object, you can access the number of entries, range of reference sequences covered by annotations, field names, and reference names. To access the values of all fields, create a structure of the data using the getData function.

Version History

Introduced in R2013a