Main Content

bioinfo.pipeline.Block

Block object for Bioinformatics pipeline

Since R2023a

Description

The bioinfo.pipeline.Block object is used in a bioinformatics pipeline to perform a unit of work, such as aligning reads to a reference, that is necessary to achieve the final goal of an analysis pipeline.

Bioinformatics Toolbox™ provides built-in blocks to accomplish some bioinformatics-specific tasks. For instance, you can use the SeqTrim block to trim genomic reads and Bowtie2 block to align reads to a reference genome.

In addition to built-in blocks, you can also convert any existing MATLAB® function into a block by using a UserFunction block and use it in your pipeline.

You can implement most pipelines and analysis workflows by using a combination of UserFunction blocks and built-in blocks. However, if you are a developer or advanced user who needs to customize a block behavior beyond those of the built-in blocks and UserFunction blocks, you can create your own block object as a subclass of the bioinfo.pipeline.Block class. For details, see Subclass Pipeline Block.

Creation

To create one of the built-in blocks, use bioinfo.pipeline.block.BlockName, where BlockName is the name of the built-in block. For example, to create a SamSort block, enter bioinfo.pipeline.block.SamSort. Similarly, to create a UserFunction block, use bioinfo.pipeline.block.UserFunction.

Tip

To see a list of built-in blocks at the MATLAB command line, enter

bioinfo.pipeline.block.
and then hit the Tab key.

Properties

expand all

Function to handle errors from the run method of the block, specified as a function handle. The handle specifies the function to call if the run method encounters an error within a pipeline. For the pipeline to continue after a block fails, ErrorHandler must return a structure that is compatible with the output ports of the block. The error handling function is called with the following two inputs:

  • Structure with these fields:

    FieldDescription
    identifierIdentifier of the error that occurred
    messageText of the error message
    indexLinear index indicating which block process failed in the parallel run. By default, the index is 1 because there is only one run per block. For details on how block inputs can be split across different dimensions for multiple run calls, see Bioinformatics Pipeline SplitDimension.

  • Input structure passed to the run method when it fails

Data Types: function_handle

This property is read-only.

Block input ports, specified as a structure. The field names are the names of the input ports and the values are bioinfo.pipeline.Input objects describing the input port behavior. These input port names are the expected fields of the input struct passed to the run method of the block.

You can change or set the values of the input port structure as follows: blockObj.Inputs.FieldName.Value = someValue.

Data Types: struct

This property is read-only.

Block output ports, specified as a structure. The field names are the names of the output ports and the values are bioinfo.pipeline.Output objects describing the output port behavior. These output port names are the expected fields of the output struct returned by the run method of the block.

Data Types: struct

Object Functions

compilePerform block-specific additional checks and validations
copyCopy array of handle objects
emptyInputsCreate input structure for use with run method
evalEvaluate block object
runRun block object

Examples

collapse all

Import the Pipeline and block objects needed for the example.

import bioinfo.pipeline.Pipeline
import bioinfo.pipeline.block.*

Create a pipeline.

qcpipeline = Pipeline;

Select an input FASTQ file using a FileChooser block.

fastqfile = FileChooser(which("SRR005164_1_50.fastq"));

Create a SeqFilter block.

sequencefilter = SeqFilter;

Define the filtering threshold value. Specifically, filter out sequences with a total of more than 10 low-quality bases, where a base is considered a low-quality base if its quality score is less than 20.

sequencefilter.Options.Threshold = [10 20];

Add the blocks to the pipeline.

addBlock(qcpipeline,[fastqfile,sequencefilter]);

Connect the output of the first block to the input of the second block. To do so, you need to first check the input and output port names of the corresponding blocks.

View the Outputs (port of the first block) and Inputs (port of the second block).

fastqfile.Outputs
ans = struct with fields:
    Files: [1×1 bioinfo.pipeline.Output]

sequencefilter.Inputs
ans = struct with fields:
    FASTQFiles: [1×1 bioinfo.pipeline.Input]

Connect the Files output port of the fastqfile block to the FASTQFiles port of sequencefilter block.

connect(qcpipeline,fastqfile,sequencefilter,["Files","FASTQFiles"]);

Next, create a UserFunction block that calls the seqqcplot function to plot the quality data of the filtered sequence data. In this case, inputFile is the required argument for the seqqcplot function. The required argument name can be anything as long as it is a valid variable name.

qcplot = UserFunction("seqqcplot",RequiredArguments="inputFile",OutputArguments="figureHandle");

Alternatively, you can also use dot notation to set up your UserFunction block.

qcplot = UserFunction;
qcplot.RequiredArguments = "inputFile";
qcplot.Function = "seqqcplot";
qcplot.OutputArguments = "figureHandle";

Add the block.

addBlock(qcpipeline,qcplot);

Check the port names of sequencefilter block and qcplot block.

sequencefilter.Outputs
ans = struct with fields:
    FilteredFASTQFiles: [1×1 bioinfo.pipeline.Output]
         NumFilteredIn: [1×1 bioinfo.pipeline.Output]
        NumFilteredOut: [1×1 bioinfo.pipeline.Output]

qcplot.Inputs
ans = struct with fields:
    inputFile: [1×1 bioinfo.pipeline.Input]

Connect the FilteredFASTQFiles port of the sequencefilter block to the inputFile port of the qcplot block.

connect(qcpipeline,sequencefilter,qcplot,["FilteredFASTQFiles","inputFile"]);

Run the pipeline to plot the sequence quality data.

run(qcpipeline);

seqqcplot_figure.png

More About

expand all

Version History

Introduced in R2023a