转录组定量与标准化

最后发布时间:2023-02-21 23:53:06 浏览量:

数据标准化的原因?

不同基因的长度不同
不同批次数据的测序量不同

图片alt

图片alt

ref:Computational methods for transcriptome annotation and quantification using RNA-seq

样本内标准化

RPM/CPM

  • RPM: Reads per million mapped reads
  • CPM: Count per million mapped reads
    图片alt

    图片alt

RPKM/FPKM

  • RPKM: Reads per kilo base per million mapped reads
  • FPKM: Fragment per kilo base million mapped reads
    图片alt

    图片alt

在双端测序中:FPKM=RPKM/2

TPM

  • TPM: Transcript per million
    图片alt

    图片alt

TPM强行要求基因最终加和相同,高表达基因会将整体的表达量拉高

样本间的标准化

直接计算比例

图片alt

图片alt

  • Cj 表示样本j的系数或矫正因子
  • Dj表示样本j的总测序量

Quantile

图片alt

图片alt

  • Cj 表示样本j的系数或矫正因子
  • Dj表示样本j的总测序量
  • Q(p)j表示样本j的p分位数

REL(Relative Log Expression)

图片alt

图片alt


使用中位数进行最终标准化

TMM(Trimmed Mean of M-value)

基本统计学概念

标准差(Standard Deviation)

  • 总体标准差:
    图片alt

    图片alt

  • 样本标准差:
    图片alt

    图片alt

  • 标准误差:
    图片alt

    图片alt

MetaboAnalyst标准化的方法包括

  • Sample Normalization
    • Sample-specific normalization
    • Normalization by sum
    • Normalization by median
    • Normalization by reference sample (PQN)
    • Normalization by a pooled sample from group
    • Normalization by reference feature
    • Quantile normalization
  • Data transformation
    • Log transformation
    • Cube root transformation
  • Data scaling
    • Mean centering (mean-centered only)
    • Auto scaling (mean-centered and divided by the standard deviation of each variable)
    • Pareto scaling (mean-centered and divided by the square root of the standard deviation of each variable)
    • Range scaling (mean-centered and divided by the range of each variable)

中位数相同

DEqMS蛋白数据差异分析
Use boxplot to check if the samples have medians centered. if not, do median centering.

生信小木屋

# Here the data is already median centered, we skip the following step. 
# dat.log = equalMedianNormalization(dat.log)

参考