差异表达分析
资源
RNAdiffAPP
https://uclouvain-cbio.github.io/WSBIM2122/sec-rnaseq.html
[1]
## https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
# Input data
dds <- DESeqDataSetFromMatrix(
countData = RNAseqObj@count,
colData = RNAseqObj@metadata,
design= ~ group)
# Differential expression analysis
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)
# dds <- DESeq(dds_filt, parallel = T)
results(dds, name = "group_AAA_vs_BBB")
DeSeq2的理论
Size factor estimation(median of ratios)
Count modeling
counts distribution for a typical RNAseq sample
- 上图中每一个点代表一个基因
- 平均值不等于方差(数据点的散布不落在对角线上)
- 对于平均表达量较高的基因,在重复样本中的方差倾向于大于平均值(散点高于红线)。
- 对于低平均表达的基因,我们看到相当多的分散。我们通常称之为“异方差(heteroscedasticity)”。也就是说,对于在低范围内基因表达的水平,我们观察到方差值的很多变化。这种现象称为Over dispersion。
如果在一个样本组的生物学重复之间 mRNA 的比例保持完全不变,我们可以预期一个泊松分布(其中均值=方差)
但是在生物学重复,总是存在一定程度的可变性。
如果我们继续添加更多的重复(即n > 20) ,我们最终会看到分散开始减少,高表达式数据点更接近红线
所以在理论上,如果我们有足够的复制品,我们可以使用泊松。
Dispersion estimation
Final dispersion estimate
DESeq2 Generalized linear model
where counts K_{ij} for gene i, sample j are modeled using a negative binomial distribution with fitted mean \mu_{ij} and a gene-specific dispersion parameter \alpha_i. The fitted mean is composed of a sample-specific size factor s_j and a parameter q_{ij} proportional to the expected true concentration of fragments for sample j. The coefficients \beta_i give the log2 fold changes for gene i for each column of the model matrix X. Note that the model can be generalized to use sample- and gene-dependent normalization factors s_{ij}.
The dispersion parameter \alpha_i defines the relationship between the variance of the observed count and its mean value. In other words, how far do we expected the observed count will be from the mean value, which depends both on the size factors_j and the covariate-dependent part q_{ij} as defined above.
Final estimate of logarithmic fold changes
参考
- https://uclouvain-cbio.github.io/WSBIM2122/sec-rnaseq.html
- Theory behind DESeq2
- Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
参考
- https://uclouvain-cbio.github.io/WSBIM2122/sec-rnaseq.html
- http://www.mi.fu-berlin.de/wiki/pub/ABI/GenomicsLecture13Materials/rnaseq2.pdf
- http://www.mi.fu-berlin.de/wiki/pub/ABI/GenomicsLecture12Materials/rnaseq1.pdf
- https://www.pathwaycommons.org/guide/primers/data_analysis/rna_sequencing_analysis/
- https://www.biostars.org/p/308142/#384322
-
RNA sequencing: the teenage years
↩