背景介绍
核糖体(Ribosome)
Amplicon
Amplified marker-gene sequences can be used to understand microbial community structure.
High-throughput sequencing of PCR-amplified marker genes has grown explosively over the past decade, especially as a means of taxonomically profiling microbial communities.
UTO
The analysis of marker-gene data customarily begins with the construction of molecular operational taxonomic units (OTUs): clusters of reads that differ by less than a fixed sequence dissimilarity threshold, most commonly 3%otu_exact.
De novo clustering vs reference-based
De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units(OTU)de_novo_otu.
FAQ
OTUs vs ASVs
There was a proposal to replace operational taxonomic units (OTUs) with amplicon sequence variants (ASVs) in marker gene-based amplicon data analysisliuPracticalGuideAmplicon2021a.
上图中圆圈代表鉴定到的序列集合,圆圈大小代表序列丰度,颜色代表样本中不同的序列;最左侧为样本序列的真实丰度,中间是由于扩增引入的误差。OTU的方法为了防止假阳性的推断,通常根据97%的相似度对序列进行聚类,从而生成OTU,然后选择每个聚类群(OTU)中最高丰都序列作为代表性序列。
通过聚类算法OTU将
红色序列
和绿色序列
聚在一起,将蓝色序列
和灰色序列
聚在一起,这将导致样本中绿色
和灰色
的真实序列信息丢失。
Circles represent identical sets of sequencing reads with size scaled by abundance and color corresponding to the true error-free sequence (there are four distinct sequences in the sample: red, green,blue and grey). Errors are introduced by amplicon sequencing from the left to the middle part of the diagram. OTU methods guard against(防止) false positive inferences by lumping(聚集) similar sequences together. DADA2 uses a statistical model of amplicon errors to infer the underlying(潜在的) sample sequences directly, and thus tries to denoise(消除干扰) the data from the middle to the leftotu_asv.
Lumping together(把…合并在一起) similar sequences reduces the rate at which errors are misinterpreted as biological variation, but OTUs under-utilize(利用不足) the quality of modern sequencing by precluding(排除) the possibility of resolving fine-scale(精细尺度) variationotu_asv.
History
Since Pace and colleaguesribosomal_rna outlined the culture-independent framework for sequencing 16S rRNA gene sequences in 1985
, microbial ecologists(生态学家) have experienced an exponential(指数的) improvement in the ability to sequence not only this primary phylogenetic(系统发育的) marker but also numerous(许多) functional genes from diverse environments.