Create Simple Pipeline to Plot Sequence Quality Data Using Biopipeline Designer
This example shows how to create a bioinformatics pipeline in the Biopipeline Designer app that loads sequence read data, filters some sequences based on quality, and displays the quality statistics of the filtered data.
Open Biopipeline Designer App
Enter the following at the MATLAB® command line.
biopipelineDesigner
Select Input File Using FileChooser Block
In the Block Libraries panel of the app, scroll down to the General section. Drag the FileChooser block onto the diagram.
You can also use the Search box to look for specific built-in blocks in the Block Libraries.
Double-click the block name FileChooser_1
and rename as FASTQ
.
Run the following command at the MATLAB command line to create a variable that contains the full file path to the provided sequence read data.
fastqFile = which("SRR005164_1_50.fastq");
In the app, click the FASTQ block. In the Pipeline Inspector pane, under FileChooser Properties, click the vertical three-dot menu next to the Files property. Select Assign from workspace
.
Select fastqFile
from the list. Click OK.
Filter Sequences Based on Quality
In the Block Libraries panel, under the Sequence Utilities section, drag the SeqFilter block onto the diagram. This block can filter sequences based on some specifications. The Pipeline Inspector panel shows the default values of the block properties and filtering options. In the SeqFilter Options section, change Threshold to 10,20
. Keep the other options as default. This 10,20
threshold value filters out any sequences with more than 10 low quality bases, where a base is considered low quality when its quality score is less than 20. For details, see SeqFilterOptions
.
Plot Sequence Quality Data
Create a custom (bioinfo.pipeline.block.UserFunction
) block that calls an existing MATLAB function seqqcplot
to plot the quality statistics of the filtered data.
In the Block Libraries panel, under the General section, drag and drop the UserFunction block onto the diagram.
Rename the block to SeqQCPlot.
In the Pipeline Inspector pane, under UserFunction Properties, set the RequiredArguments to
inputFile
and Function to seqqcplot.
Connect Blocks and Run Pipeline
After setting up the blocks, you can now connect them to complete the pipeline.
Drag an arrow from the Files output port of FASTQ to the FASTQFiles port of SeqFilter_1.
Next connect the FilteredFASTQFiles port to inputFile port.
On the toolstrip of the app, click Run. During the run, you can see the progress of each block at its status bar. Point to a color-coded section with a number to see its meaning.
After the run, you can click each output port name of a block to see the output value. For example, click NumFilteredOut to see the total number of reads that were filtered out by the block.
The app generates the following figure, which contains quality statistics plots of the filtered data.
If there are any errors or warnings, the app shows them in the Diagnostics tab of the Pipeline Information panel, which is at the bottom of the diagram.
Click the Results tab. In the Source column, expand SeqFilter_1 to see the block results, such as the filtered FASTQ file and the number of sequences that are selected and filtered out.
Rerun Pipeline with Different Filtering Threshold
You can specify a different threshold to filter sequences and rerun the pipeline. The app is aware of which blocks in the pipeline have changed and which other blocks, such as downstream blocks, are affected as a result. Hence, on subsequent runs, it reruns only those blocks that are needed, instead of every block in the pipeline. For details, see Bioinformatics Pipeline Run Mode.
Click SeqFilter_1. In the Pipeline Inspector panel, change its Threshold option to 5,20
. This setting now filters out any sequence with more than 5 low quality bases, where a base is considered low quality when its score is less than 20. Both SeqFilter and SeqQCPlot blocks now have a warning icon to indicate that the results are now out of date due to the change to the SeqFilter block.
By default, the app saves the pipeline results in the PipelineResults
folder in the current directory. It contains the pipeline results from the previous run before you changed the filtering threshold. If you want to save the rerun results to a different folder and avoid overwriting the previous results, you can change the directory location. Click Set Results Directory on the Home tab and set the directory to a different location, such as C:\Biopipeline_Designer\SeqQCPlot_App_Example
. If you point to the button, the app shows the directory location.
Click Run. The app generates the following figure. During this run, the app does not rerun the FASTQ block because it is not needed. It only reruns the other two blocks.
Go to the Results tab of the Pipeline Information to check the new results.
Export Results
You can export each output of a block or every output of a block to the MATLAB workspace by selecting Export to Workspace from the context (right-click) menu of the corresponding row in the Results table. To export all outputs of a block, right-click at the block level.
See Also
Biopipeline
Designer | Bioinformatics Pipeline Run Mode | SeqFilterOptions