Main Content

bioinfo.pipeline.Input

Input port object for bioinformatics pipeline block

Since R2023a

Description

Each input port of a bioinfo.pipeline.Block object is a bioinfo.pipeline.Input object.

Creation

Create the object using bioinfo.pipeline.Input.

Properties

expand all

Instruction on how to split the block inputs across multiple runs of block in a pipeline, specified as a vector of positive integers or "all".

Some of the blocks in a bioinformatics pipeline operate on their input data arrays as one single input while other blocks can operate on individual elements or slices of the input data array independently. The SplitDimension property of a block input controls how to split the block input data (or input array) across multiple runs of the same block in a pipeline. By default, the block input data are passed unchanged (that is, there is no dimensional splitting of the input data) to the run method of the block, which means that the block runs once for all of the input data array.

Specify a vector of integers to indicate which dimensions of the input array to split and pass to the block run method. By splitting the input array, you are specifying how many times you want to run the same block with different inputs. Use "all" to pass all elements of the input value to the run method of the block independently. If there are n elements, the block runs n times independently. For example, you can use a Bowtie2 block to align three input files to a single SAM file, or use "all" to let Bowtie2 run three times, aligning each input file to a distinct SAM file.

When a block has a single input with split dimensions, the input value is split in the corresponding dimensions (such as row-dimension or column-dimension) before being passed to the run method of the block. The total number of times the block runs within a pipeline is the product of the sizes of the input value in the split dimensions.

For details, see Bioinformatics Pipeline SplitDimension.

Data Types: double | char | string

This property is read-only.

Flag to indicate if the input port is required for the block to run, specified as a numeric or logical 1 (true) or 0 (false).

A required input port (Required=true) must be satisfied. Otherwise, the pipeline fails to compile and does not run.

You can set the value as true or false when you define a block subclass. For details, see Subclass Pipeline Block.

Data Types: double | logical

Input port value. By default, the value is set as a bioinfo.pipeline.datatype.Unset object which means that no value is provided, and the input value comes from a connected upstream block or input structure passed to the run call.

If an input port with a set value is also connected to an output port of another block, the value coming from the connected block is used instead of the set value.

Examples

collapse all

Import the pipeline and block objects needed for the example.

import bioinfo.pipeline.Pipeline
import bioinfo.pipeline.block.*

Create a pipeline.

P = Pipeline
P = 
  Pipeline with properties:

        Blocks: [0×1 bioinfo.pipeline.Block]
    BlockNames: [0×1 string]

Use a FileChooser block to select the provided SAM files. The files contain aligned reads for Mycoplasma pneumoniae from two samples.

fileChooserBlock = FileChooser([which("Myco_1_1.sam"); which("Myco_1_2.sam")]);

Create a Cufflinks block.

cufflinksBlock = Cufflinks;

Add the blocks to the pipeline.

addBlock(P,[fileChooserBlock,cufflinksBlock]);

Connect the blocks.

connect(P,fileChooserBlock,cufflinksBlock,["Files","GenomicAlignmentFiles"]);

Set SplitDimension to 1 for the GenomicAlignmentFiles input port. The value of 1 corresponds to the row dimension of the input, which means that the Cufflinks block will run on each individual SAM files (Myco_1_1.sam and Myco_1_1.sam).

cufflinksBlock.Inputs.GenomicAlignmentFiles.SplitDimension = 1;

Run the pipeline. The pipeline runs Cufflinks block two times independently and generates a set of four files for each SAM file.

run(P);

Get the block results.

cufflinksResults = results(P,cufflinksBlock)
cufflinksResults = struct with fields:
           TranscriptsGTFFile: [2×1 bioinfo.pipeline.datatype.File]
             IsoformsFPKMFile: [2×1 bioinfo.pipeline.datatype.File]
                GenesFPKMFile: [2×1 bioinfo.pipeline.datatype.File]
    SkippedTranscriptsGTFFile: [2×1 bioinfo.pipeline.datatype.File]

Use the process table to check the total number of runs for each block. Cufflinks ran two times independently.

t = processTable(P,Expanded=true);

Set SplitDimension to empty [] (which is the default). In this case, the pipeline does split the input files and runs Cufflinks just once for both SAM files, processing each SAM file one after another.

cufflinksBlock.Inputs.GenomicAlignmentFiles.SplitDimension = [];
deleteResults(P,IncludeFiles=true);
run(P);
cufflinksResults = results(P,cufflinksBlock)
cufflinksResults = struct with fields:
           TranscriptsGTFFile: [2×1 bioinfo.pipeline.datatype.File]
             IsoformsFPKMFile: [2×1 bioinfo.pipeline.datatype.File]
                GenesFPKMFile: [2×1 bioinfo.pipeline.datatype.File]
    SkippedTranscriptsGTFFile: [2×1 bioinfo.pipeline.datatype.File]

Check the process table, which confirms that Cufflinks ran just once.

t2 = processTable(P,Expanded=true);

Tip: you can speed up the pipeline run by setting UseParallel=true if you have Parallel Computing Toolbox™. The pipeline can schedule independent executions of blocks on parallel pool workers.

run(P,UseParallel=true)

Version History

Introduced in R2023a