How to pre-process Next Generation Sequencing data using MATLAB?

Question

E V 2016-9-22

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/304007-how-to-pre-process-next-generation-sequencing-data-using-matlab

评论： E V 2016-9-22

I am trying to use MATLAB for pre-processing of NGS data. Can anyone suggest a comprehensive code for this procedure. I have tried codes suggested in this page but the codes can only be used for a limited number of tasks. For example I don't know how to filter (or mask) reads shorter than 10 nucleotides or how to treat paired-end reads. Moreover, how can I filter reads that have more than two N nucleotides? can anyone suggest a comprehensive reference for these tasks?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Luuk van Oosten 2016-9-22

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/304007-how-to-pre-process-next-generation-sequencing-data-using-matlab#answer_235638

在 MATLAB Online 中打开

Dear Ehsan,

The page you refer to is a good start to get familiar with processing NGS data, but there are (a lot more!!) functions in the BioInformatics toolbox that will help you with preprocessing. Now to your specific questions:

(1) how to filter (or mask) reads shorter than 10 nucleotides:

You can use the 'seqfilter' function ( over here) and then using something like the following:

your_filtered_data = seqfilter(yourdata.fastq, 'Method','MinLength','Threshold',10)

(2) how to treat paired-end reads

You are in luck, as there exist this thingy called 'seqsplitpe', which allows you to split merged paired-end sequences into separate files (if that is something you want).

(3) how can I filter reads that have more than two N nucleotides

This is probably a combination of (a) importing your sequences and then (b) searching your sequences for your specific repeat of >N nucleotides. I believe there are no pre-fabricated functions in MATLAB for this, but there are numerous functions which allow you to analyze sequences in the Bioinformatics Toolbox.

Best regards

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

E V 2016-9-22

Thank you very much Luuk.

I use an older version of MATLAB and I don't have access to commands like "seqfilter" or "seqtrim". I think I have to upgrade my software.

Best regards

请先登录，再进行评论。

How to pre-process Next Generation Sequencing data using MATLAB?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

How to pre-process Next Generation Sequencing data using MATLAB?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论