甲基化数据下游分析及可视化
multiqc_report
bismark_report
bismark_summary_report
全基因组甲基化水平分析(胞嘧啶碱基的甲基化水平)
通常CG甲基化存在于基因和重复序列中,在基因表达调控过程中起到非常重要的作用。非CG类型的序列(CHG和CHH)在基因中非常少见,主要存在于基因间区和含重复序列的区域,在沉默转座子过程中起关键作用。
全基因组甲基化数据分布趋势
甲基化 C 碱基中 CG,CHG,与 CHH 的分布比例
mCG、mCHG 和 mCHH 三种碱基类型的构成比例在不同物种中,甚至是同一物种不同样品中都存
在很大差异。因此,不同时间、空间、生理条件下的样品会表现出不同的甲基化图谱,各类 mC(mCG、
mCHG 和,mCHH)的数目,及其在全部 mC 的位点中所占的比例,在一定程度上反应了特定物种的全基
因组甲基化图谱特征。mCG、mCGH 和 mCHH 分别表示甲基化 CG、甲基化 CHG 和甲基化 CHH。三
种碱基类型占比总和为 100%。
Bismark report for: trimmed/test_R1_trim.fq.gz and trimmed/test_R2_trim.fq.gz (version: v0.22.1)
Bismark was run with Bowtie 2 against the bisulfite genome of /data/wangyang/snakemake-bisulfite/data2/index/ with the specified options: -q --score-min L,0,-0.2 -p 3 --reorder --ignore-quals --no-mixed --no-discordant --dovetail --maxins 500
Option '--non_directional' specified: alignments to all strands were being performed (OT, OB, CTOT, CTOB)
Final Alignment report
======================
Sequence pairs analysed in total: 5000
Number of paired-end alignments with a unique best hit: 4619
Mapping efficiency: 92.4%
Sequence pairs with no alignments under any condition: 0
Sequence pairs did not map uniquely: 381
Sequence pairs which were discarded because genomic sequence could not be extracted: 0
Number of sequence pairs with unique best (first) alignment came from the bowtie output:
CT/GA/CT: 2323 ((converted) top strand)
GA/CT/CT: 0 (complementary to (converted) top strand)
GA/CT/GA: 0 (complementary to (converted) bottom strand)
CT/GA/GA: 2296 ((converted) bottom strand)
Final Cytosine Methylation Report
=================================
Total number of C's analysed: 231335
Total methylated C's in CpG context: 52326
Total methylated C's in CHG context: 1132
Total methylated C's in CHH context: 2039
Total methylated C's in Unknown context: 0
Total unmethylated C's in CpG context: 15754
Total unmethylated C's in CHG context: 56443
Total unmethylated C's in CHH context: 103641
Total unmethylated C's in Unknown context: 0
C methylated in CpG context: 76.9%
C methylated in CHG context: 2.0%
C methylated in CHH context: 1.9%
Can't determine percentage of methylated Cs in unknown context (CN or CHN) if value was 0
Bismark completed in 0d 0h 0m 10s
test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bam
Parameters used to extract methylation information:
Bismark Extractor Version: v0.22.1
Bismark result file: single-end (SAM format)
Output specified: comprehensive
Processed 9238 lines in total
Total number of methylation call strings processed: 9238
Final Cytosine Methylation Report
=================================
Total number of C's analysed: 231335
Total methylated C's in CpG context: 52326
Total methylated C's in CHG context: 1132
Total methylated C's in CHH context: 2039
Total C to T conversions in CpG context: 15754
Total C to T conversions in CHG context: 56443
Total C to T conversions in CHH context: 103641
C methylated in CpG context: 76.9%
C methylated in CHG context: 2.0%
C methylated in CHH context: 1.9%
CG、CHG 和 CHH 中的所有 C 的甲基化水平
不同类型的 C 碱基(CG、CHG 和 CHH), 其甲基化水平在不同物种间,甚至同一物种不同细胞类型不
同条件下其甲基化水平都存在差。计每种类型(CG、CHG 和 CHH)甲基化 C 的甲基化水平分布,反应了该物种 DNA 甲基化特征。
图中x轴表示甲基化水平,y轴表示特定甲基化水平的胞嘧啶碱基在所有甲基化碱基中所占的比例
test2.myCpG.txt
chrBase | chr | base | strand | coverage | freqC | freqT |
---|---|---|---|---|---|---|
chr21.9764513 | chr21 | 9764513 | R | 20 | 75 | 25 |
chr21.9764539 | chr21 | 9764539 | R | 20 | 95 | 5 |
chr21.9767701 | chr21 | 9767701 | F | 10 | 10 | 90 |
chr21.9804584 | chr21 | 9804584 | F | 10 | 90 | 10 |
chr21.9804595 | chr21 | 9804595 | F | 10 | 100 | 0 |
chr21.9802620 | chr21 | 9802620 | F | 13 | 61.54 | 38.46 |
library(methylKit)
file.list=list( system.file("extdata",
"test1.myCpG.txt", package = "methylKit"),
system.file("extdata",
"test2.myCpG.txt", package = "methylKit"),
system.file("extdata",
"control1.myCpG.txt", package = "methylKit"),
system.file("extdata",
"control2.myCpG.txt", package = "methylKit") )
myobj=methRead(file.list,
sample.id=list("test1","test2","ctrl1","ctrl2"),
assembly="hg18",
treatment=c(1,1,0,0),
context="CpG",
mincov = 10)
png(file="a.png")
getMethylationStats(myobj[[2]],plot=TRUE,both.strands=FALSE)
dev.off()
library(tidyverse)
myCpG <- read_tsv(system.file("extdata", "test2.myCpG.txt", package = "methylKit"))
png(file="a.png")
hist(myCpG$freqC)
dev.off()
CG、CHG 和 CHH 中的所有 C 的甲基化覆盖度分布
library(methylKit)
file.list=list( system.file("extdata",
"test1.myCpG.txt", package = "methylKit"),
system.file("extdata",
"test2.myCpG.txt", package = "methylKit"),
system.file("extdata",
"control1.myCpG.txt", package = "methylKit"),
system.file("extdata",
"control2.myCpG.txt", package = "methylKit") )
myobj=methRead(file.list,
sample.id=list("test1","test2","ctrl1","ctrl2"),
assembly="hg18",
treatment=c(1,1,0,0),
context="CpG",
mincov = 10)
png(file="a.png")
getCoverageStats(myobj[[2]],plot=TRUE,both.strands=F)
dev.off()
library(tidyverse)
myCpG <- read_tsv(system.file("extdata", "test2.myCpG.txt", package = "methylKit"))
png(file="a.png")
hist(log10(myCpG$coverage+1))
dev.off()
CG、CHG 和 CHH 中甲基化 C 附近的 9bp 序列的序列特征分析
过滤覆盖率过高或过低的碱基
如果我们的样本受到PCR偏倚的影响,那么丢弃具有非常高读覆盖率的碱基将是有用的。此外,我们还希望丢弃读覆盖率低的碱基,足够高的读覆盖率将增加统计测试的能力。
filtered.myobj=filterByCoverage(myobj,lo.count=10,lo.perc=NULL,
hi.count=NULL,hi.perc=99.9)
上面的代码过滤一个methylRawList并丢弃覆盖率低于10X的碱基,同时丢弃每个样本中覆盖率超过99.9%的碱基。
合并样本
为了做进一步的分析,我们需要得到所有样本中包含的碱基。以下函数将所有样本合并到一个对象中,用于所有样本中包含的基对位置。设置destrand=TRUE(默认值为FALSE)将合并CpG二核苷酸两条链上的读取。这提供了更好的覆盖范围,但仅在观察CpG甲基化时建议(对于CpH甲基化,这将导致后续分析中的错误结果)。此外,设置destrand=TRUE仅在碱基对分辨率上操作时有效,否则设置此选项TRUE将无效。unite()
函数将返回一个methylBase
对象,它将是所有比较分析的主要对象。methylBase
对象包含所有样本中包含的区域/碱基的甲基化信息。
meth=methylKit::unite(myobj, destrand=FALSE)