转录组定量与标准化
最后发布时间:2023-02-21 23:53:06
浏览量:
数据标准化的原因?
不同基因的长度不同
不同批次数据的测序量不同
ref:Computational methods for transcriptome annotation and quantification using RNA-seq
样本内标准化
RPM/CPM
- RPM: Reads per million mapped reads
- CPM: Count per million mapped reads
RPKM/FPKM
- RPKM: Reads per kilo base per million mapped reads
- FPKM: Fragment per kilo base million mapped reads
在双端测序中:FPKM=RPKM/2
TPM
- TPM: Transcript per million
TPM强行要求基因最终加和相同,高表达基因会将整体的表达量拉高
样本间的标准化
直接计算比例
- Cj 表示样本j的系数或矫正因子
- Dj表示样本j的总测序量
Quantile
- Cj 表示样本j的系数或矫正因子
- Dj表示样本j的总测序量
- Q(p)j表示样本j的p分位数
REL(Relative Log Expression)
使用中位数进行最终标准化
TMM(Trimmed Mean of M-value)
基本统计学概念
标准差(Standard Deviation)
- 总体标准差:
- 样本标准差:
- 标准误差:
MetaboAnalyst标准化的方法包括
- Sample Normalization
- Sample-specific normalization
- Normalization by sum
- Normalization by median
- Normalization by reference sample (PQN)
- Normalization by a pooled sample from group
- Normalization by reference feature
- Quantile normalization
- Data transformation
- Log transformation
- Cube root transformation
- Data scaling
- Mean centering (mean-centered only)
- Auto scaling (mean-centered and divided by the standard deviation of each variable)
- Pareto scaling (mean-centered and divided by the square root of the standard deviation of each variable)
- Range scaling (mean-centered and divided by the range of each variable)
中位数相同
DEqMS蛋白数据差异分析
Use boxplot to check if the samples have medians centered. if not, do median centering.
# Here the data is already median centered, we skip the following step.
# dat.log = equalMedianNormalization(dat.log)