Library sizes and distribution plots

Filtering to remove lowly expressed genes

plot the library sizes as a barplot to see whether there are any major discrepancies between the samples

图片alt

图片alt

examine the distributions of the raw counts

Count data is not normally distributed, so if we want to examine the distributions of the raw counts we need to log the counts.
We can use the vst function from DESeq2 to apply a variance-stablising transformation.
The effect is to remove the dependence of the variance on the mean, particularly the high variance of the logarithm of count data when the mean is low.
The resulting counts have also been normalized with respect to library size or other normalization factors.

图片alt

图片alt

The resulting counts have also been normalized with respect to library size or other normalization factors. If a sample is really far above or below the blue horizontal line we may need to investigate that sample further.

TMM normalization and voom transformation

GDCRNATools

By running gdcVoomNormalization() function, raw counts data would be normalized by TMM method implemented in edgeR(Robinson, McCarthy, and Smyth 2010) and further transformed by the voom method provided in limma(Ritchie et al. 2015).
Low expression genes (logcpm < 1 in more than half of the samples) will be filtered out by default.

参考