RNAdiffAPPhttps://uclouvain-cbio.github.io/WSBIM2122/sec-rnaseq.html
## https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html # Input data dds <- DESeqDataSetFromMatrix( countData = RNAseqObj@count, colData = RNAseqObj@metadata, design= ~ group) # Differential expression analysis dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) dds <- nbinomWaldTest(dds) # dds <- DESeq(dds_filt, parallel = T) results(dds, name = "group_AAA_vs_BBB")
如果在一个样本组的生物学重复之间 mRNA 的比例保持完全不变,我们可以预期一个泊松分布(其中均值=方差)但是在生物学重复,总是存在一定程度的可变性。如果我们继续添加更多的重复(即n > 20) ,我们最终会看到分散开始减少,高表达式数据点更接近红线所以在理论上,如果我们有足够的复制品,我们可以使用泊松。
where counts K_{ij} for gene i, sample j are modeled using a negative binomial distribution with fitted mean \mu_{ij} and a gene-specific dispersion parameter \alpha_i. The fitted mean is composed of a sample-specific size factor s_j and a parameter q_{ij} proportional to the expected true concentration of fragments for sample j. The coefficients \beta_i give the log2 fold changes for gene i for each column of the model matrix X. Note that the model can be generalized to use sample- and gene-dependent normalization factors s_{ij}.
The dispersion parameter \alpha_i defines the relationship between the variance of the observed count and its mean value. In other words, how far do we expected the observed count will be from the mean value, which depends both on the size factors_j and the covariate-dependent part q_{ij} as defined above.
RNA sequencing: the teenage years