How to perform taxonomic analysis of 16s rRNA NGS .fastq files?

8 次查看(过去 30 天)
I have raw files from Next-generation sequencing of 16s rRNA in .fastq format and I want to analyse them to obtain the OTU and taxonomy relative abundance of all the microbial species present in the sample.
Thank you.

采纳的回答

Tim DeFreitas
Tim DeFreitas 2019-3-29
A complete answer to this question is outside the scope of a single MATLAB Answers post, I suggest reading some published papers on various approaches to reconstructing phylogeny with 16s rRNA. Here's one such paper, though there are many others: https://academic.oup.com/nar/article/36/18/e120/1070009.
In general, you will need to perform the following series of steps:
  1. Obtain reference sequences of the 16s gene (likely in FASTA format) for each of the microbial species you wish to test for. These can likely be obtained from public databases like the NCBI: https://www.ncbi.nlm.nih.gov/gene/?term=16s%20rrna. For particular sequences of interest, you can obtain these in MATLAB using getgetbank
  2. Assign each of your input reads to it's closest species match. There are several methods to do so, one way is to use blastlocal using the FASTA reference sequences from step 1 as the database, and your FASTQ reads as the queries. The relative abundance of each species can be inferred from the number of matches to each of your reference sequences.
  3. To construct a taxonomy, you must then perform a multiple alignment of the 16s gene for each of your observed species (likely a subset of your references from (1)), and construct a phylogenetic tree using the distances between each sequence. In MATLAB, this can be done with multialign, seqpdist, and seqlinkage. The definition of an OTU is not set in stone, but in general is a common set of very similar sequences. From the phytree created with seqlinkage, you can construct OTUs by providing a similarity threshold using cluster(phytree).
Feel free to ask more specific questions about any of these steps in a follow up question. If you need broader help with constructing a pipeline to do this analysis, we do offer consulting.
Hope this helps,
-Tim
  1 个评论
Mattana Pongsopon
Hi Tim,
Thank you so much for your clear guideline. I will work through them and see if I need further help.
Best,
Mattana

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Genomics and Next Generation Sequencing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by