bioinfo.pipeline.block.Bowtie2

Bioinformatics pipeline block to align sequencing reads to reference sequences

Since R2023a

Description

A Bowtie2 block enables you to map sequencing reads to reference sequences.

The block requires the Bowtie 2 Support Package for Bioinformatics Toolbox™. If this support package is not installed, then a download link is provided. For details, see Bioinformatics Toolbox Software Support Packages.

Creation

Syntax

b = bioinfo.pipeline.block.Bowtie2

b = bioinfo.pipeline.block.Bowtie2(options)

b = bioinfo.pipeline.block.Bowtie2(OutFilename=fileName)

b = bioinfo.pipeline.block.Bowtie2(Name=Value)

Description

b = bioinfo.pipeline.block.Bowtie2 creates a Bowtie2 block.

example

b = bioinfo.pipeline.block.Bowtie2(options) also specifies additional alignment options.

b = bioinfo.pipeline.block.Bowtie2(OutFilename=fileName) also specifies the output file name.

b = bioinfo.pipeline.block.Bowtie2(Name=Value) specifies additional options as the property names and values of a Bowtie2AlignOptions object. This object is set as the value of the Options property of the block. For example, bt2Block = bioinfo.pipeline.block.Bowtie2(Trim5=10) sets the Trim5 property of the object to trim 10 residues from the 5' end.

Input Arguments

expand all

`fileName` — Output file name
string | character vector

Output file name, specified as a string or character vector. The file extension must end with .sam. The block saves the mapping results to this file.

Data Types: char | string

`options` — Bowtie2 options
`Bowtie2AlignOptions` | string | character vector

Bowtie2 options, specified as a Bowtie2AlignOptions object, string, or character vector.

If you are specifying a string or character vector, it must be in the native bowtie2 option syntax (prefixed by one or two dashes) [1].

Data Types: char | string

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: bt2Block = bioinfo.pipeline.block.Bowtie2(Trim3=6) specifies to trim 6 residues from the 3' end.

Note

The following list of arguments is a partial list. For the complete list, refer to the properties of Bowtie2AlignOptions object.

`AllowDovetail` — Flag to allow dovetail configurations
`false` or 0 (default) | `true` or 1

Flag to allow dovetail configurations, specified as 1 (true) or 0 (false). This property specifies whether the alignment of one mate can extend past the beginning of the alignment of the other mate and be considered concordant.

This property applies to paired-end reads only.

Data Types: double | logical

`AmbiguousPenalty` — Penalty for positions with ambiguous characters
`1` (default) | nonnegative integer

Penalty for positions with ambiguous characters on the read sequence, reference sequence, or both, specified as a nonnegative integer.

Data Types: double

Properties

expand all

`ErrorHandler` — Function to handle errors from `run` method
function handle

Function to handle errors from the run method of the block, specified as a function handle. The handle specifies the function to call if the run method encounters an error within a pipeline. For the pipeline to continue after a block fails, ErrorHandler must return a structure that is compatible with the output ports of the block. The error handling function is called with the following two inputs:

Structure with these fields:

Field	Description
identifier	Identifier of the error that occurred
message	Text of the error message
index	Linear index indicating which block process failed in the parallel run. By default, the index is 1 because there is only one run per block. For details on how block inputs can be split across different dimensions for multiple run calls, see Bioinformatics Pipeline SplitDimension.

Input structure passed to the run method when it fails

Data Types: function_handle

`Inputs` — Input ports
structure

This property is read-only.

Input ports of the block, specified as a structure. The field names of the structure are the names of the block input ports, and the field values are bioinfo.pipeline.Input objects. These objects describe the input port behaviors. The input port names are the expected field names of the input structure that you pass to the block run method.

The Bowtie2 block Inputs structure has the following fields:

IndexBaseName — Base name of the reference index files. The index files are in the BT2 or BT21 format. For example, if you have Dmel_chr4.1.bt2 and Dmel_chr4.2.bt2 as your index files, specify IndexBaseName as "Dmel_chr4". This input is a required input that must be satisfied.
Reads1Files — Names of FASTQ files for the first mate reads or single-end reads. For paired-end data, sequences in Reads1Files must correspond file-for-file and read-for-read to sequences in Reads2Files. This input is a required input that must be satisfied.
Reads2Files — Names of FASTQ files for the second mate reads for paired-end data. This input is an optional input.

The default value for each of these inputs is a bioinfo.pipeline.datatypes.Unset object, which means that the input value is not set yet.

Data Types: struct

`Outputs` — Output ports
structure

This property is read-only.

Output ports of the block, specified as a structure. The field names of the structure are the names of the block output ports, and the field values are bioinfo.pipeline.Output objects. These objects describe the output port behaviors. The field names of the output structure returned by the block run method are the same as the output port names.

The Bowtie2 block Outputs structure has the field named SAMFile.

Data Types: struct

`Options` — Bowtie2 options
`Bowtie2AlignOptions` object (default)

Bowtie2 options, specified as a Bowtie2AlignOptions object. The default value is a default Bowtie2AlignOptions object.

`OutFilename` — Output file name
`"Aligned.sam"` (default) | string

Output file name, specified as a string. By default, the output file is named as Aligned.sam, which contains the mapping results.

Data Types: string

Object Functions

`compile`	Perform block-specific additional checks and validations
`copy`	Copy array of handle objects
`emptyInputs`	Create input structure for use with `run` method
`eval`	Evaluate block object
`run`	Run block object

Examples

collapse all

Align Reads Using Bowtie 2

Open Live Script

Import the pipeline and block objects needed for the example.

import bioinfo.pipeline.block.*
import bioinfo.pipeline.Pipeline

Create a FileChooser block to select a read file provided with the toolbox.

FC = FileChooser(which("SRR6008575_10k_1.fq"));

Create a Bowtie2 block and a pipeline object.

B = Bowtie2;
P = Pipeline;

Add blocks to the pipeline.

addBlock(P, [FC B]);

Set the IndexBaseName input port value to "Dmel_chr4" which is the base name of the index files for Drosophila genome provided with the toolbox.

B.Inputs.IndexBaseName.Value = "Dmel_chr4";

Connect the blocks.

connect(P, FC, B, ["Files", "Reads1Files"]);

Run the pipeline.

run(P);
R = results(P,B)

R = struct with fields:
    SAMFile: [1×1 bioinfo.pipeline.datatype.File]

Call unwrap to see the location of the output file.

unwrap(R.SAMFile)

Fetch Parallel-Running Block Results from Bioinformatics Pipeline

Open Live Script

Import the pipeline and block objects needed for the example.

import bioinfo.pipeline.Pipeline
import bioinfo.pipeline.blocks.*

Create a pipeline.

P = Pipeline;

A FileChooser block can take in a URL of a remote file as an input and download the file to make it available for the downstream blocks. Download the file Homo_sapiens.GRCh38.dna.chromosome.19.fa.gz that contains the human reference genome chromosome 19 in the FASTA format.

chr19url = "http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.19.fa.gz";
fileChooserBlock1 = FileChooser(chr19url);

A UserFunction block to unzip the downloaded reference genome file using the gunzip function. When you create the block, you can specify the function to call and set the input and output port names that map to the input and output arguments of the corresponding function, respectively. In this example, name the input port as "ZippedFilenames" and the output port as "UnzippedFilenames"

gunzipUserFunctionBlock = UserFunction(@gunzip,RequiredArguments="ZippedFilenames",OutputArguments="UnzippedFilenames");

The reference genome file needs to be indexed in before reads can be aligned to it. To generate the indices, create a Bowtie2Build block.

bowtie2BuildBlock = Bowtie2Build;

Add the blocks.

addBlock(P,[fileChooserBlock1,gunzipUserFunctionBlock,bowtie2BuildBlock]);

Connect the output port named "Files" of fileChooserBlock1 to the input port named "ZippedFileNames" of gunzipUserFunctionBlock. Also connect the output "UnzippedFilenames" of gunzipUserFunctionBlock to the input "ReferenceFASTAFiles" of bowtie2BuildBlock.

connect(P,fileChooserBlock1,gunzipUserFunctionBlock,["Files","ZippedFilenames"]);
connect(P,gunzipUserFunctionBlock,bowtie2BuildBlock,["UnzippedFilenames","ReferenceFASTAFiles"]);

Create blocks for downloading RNA-seq data.

adrenal_1_url = "https://usegalaxy.org/dataset/display?dataset_id=d44d2a324474d1aa&to_ext=fq";
adrenal_2_url = "https://usegalaxy.org/dataset/display?dataset_id=d08360a1c0ffdc62&to_ext=fq";
brain_1_url =   "https://usegalaxy.org/dataset/display?dataset_id=f187acb8015d6c7f&to_ext=fq";
brain_2_url =   "https://usegalaxy.org/dataset/display?dataset_id=08c45996966d7ded&to_ext=fq";
fileChooserBlock2 = FileChooser([brain_1_url;adrenal_1_url]);
fileChooserBlock3 = FileChooser([brain_2_url;adrenal_2_url]);

Create a Bowtie2 block for mapping reads.

bowtie2Block = Bowtie2;

Add blocks to the pipeline.

addBlock(P,[fileChooserBlock2,fileChooserBlock3,bowtie2Block]);

Connect the blocks.

connect(P,bowtie2BuildBlock,bowtie2Block,["IndexBaseName","IndexBaseName"]);
connect(P,fileChooserBlock2,bowtie2Block,["Files","Reads1Files"]);
connect(P,fileChooserBlock3,bowtie2Block,["Files","Reads2Files"]);

Run the pipeline in parallel.

run(P,UseParallel=true);

Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 4 workers.

If you try to get the block results while the pipeline is still running, you get an incomplete result.

bt2Results = results(P,bowtie2Block)

bt2Results = 
  Incomplete pipeline result.

Use fetchResults to wait for the blocks that are running in parallel to complete and get the results.

bt2Results = fetchResults(P,bowtie2Block)

bt2Results = struct with fields:
    SAMFile: [1×1 bioinfo.pipeline.datatype.File]

Tip: Use the unwrap method to see the location of the output file. For example, unwrap(bt2Results.SAMFile) shows the location of the sorted SAM file.

Alternatively, you can use the following two commands instead of fetchResults.

wait(P,bowtie2Block);
bt2Results = results(P,bowtie2Block);

References

[1] Langmead, Ben, and Steven L Salzberg. “Fast Gapped-Read Alignment with Bowtie 2.” Nature Methods 9, no. 4 (April 2012): 357–59. https://doi.org/10.1038/nmeth.1923.

Version History

Introduced in R2023a

bioinfo.pipeline.block.Bowtie2

Description

Creation

Syntax

Description

Input Arguments

fileName — Output file name string | character vector

options — Bowtie2 options Bowtie2AlignOptions | string | character vector

AllowDovetail — Flag to allow dovetail configurations false or 0 (default) | true or 1

AmbiguousPenalty — Penalty for positions with ambiguous characters 1 (default) | nonnegative integer

Properties

ErrorHandler — Function to handle errors from run method function handle

Inputs — Input ports structure

Outputs — Output ports structure

Options — Bowtie2 options Bowtie2AlignOptions object (default)

OutFilename — Output file name "Aligned.sam" (default) | string

Object Functions

Examples

Align Reads Using Bowtie 2

Fetch Parallel-Running Block Results from Bioinformatics Pipeline

References

Version History

See Also

`fileName` — Output file name
string | character vector

`options` — Bowtie2 options
`Bowtie2AlignOptions` | string | character vector

`AllowDovetail` — Flag to allow dovetail configurations
`false` or 0 (default) | `true` or 1

`AmbiguousPenalty` — Penalty for positions with ambiguous characters
`1` (default) | nonnegative integer

`ErrorHandler` — Function to handle errors from `run` method
function handle

`Inputs` — Input ports
structure

`Outputs` — Output ports
structure

`Options` — Bowtie2 options
`Bowtie2AlignOptions` object (default)

`OutFilename` — Output file name
`"Aligned.sam"` (default) | string