multiqc_reportbismark_reportbismark_summary_report
通常CG甲基化存在于基因和重复序列中,在基因表达调控过程中起到非常重要的作用。非CG类型的序列(CHG和CHH)在基因中非常少见,主要存在于基因间区和含重复序列的区域,在沉默转座子过程中起关键作用。
样本A全基因组各类调控元件范围内的甲基化水平
mCG、mCHG 和 mCHH 三种碱基类型的构成比例在不同物种中,甚至是同一物种不同样品中都存在很大差异。因此,不同时间、空间、生理条件下的样品会表现出不同的甲基化图谱,各类 mC(mCG、mCHG 和,mCHH)的数目,及其在全部 mC 的位点中所占的比例,在一定程度上反应了特定物种的全基因组甲基化图谱特征。mCG、mCGH 和 mCHH 分别表示甲基化 CG、甲基化 CHG 和甲基化 CHH。三种碱基类型占比总和为 100%。
样本A中 mCG、mCHG 和 mCHH三种类型甲基化胞嘧啶的比例
样本A中 mCG、mCHG 和 mCHH 三种类型甲基化胞嘧啶的分布
Bismark report for: trimmed/test_R1_trim.fq.gz and trimmed/test_R2_trim.fq.gz (version: v0.22.1) Bismark was run with Bowtie 2 against the bisulfite genome of /data/wangyang/snakemake-bisulfite/data2/index/ with the specified options: -q --score-min L,0,-0.2 -p 3 --reorder --ignore-quals --no-mixed --no-discordant --dovetail --maxins 500 Option '--non_directional' specified: alignments to all strands were being performed (OT, OB, CTOT, CTOB) Final Alignment report ====================== Sequence pairs analysed in total: 5000 Number of paired-end alignments with a unique best hit: 4619 Mapping efficiency: 92.4% Sequence pairs with no alignments under any condition: 0 Sequence pairs did not map uniquely: 381 Sequence pairs which were discarded because genomic sequence could not be extracted: 0 Number of sequence pairs with unique best (first) alignment came from the bowtie output: CT/GA/CT: 2323 ((converted) top strand) GA/CT/CT: 0 (complementary to (converted) top strand) GA/CT/GA: 0 (complementary to (converted) bottom strand) CT/GA/GA: 2296 ((converted) bottom strand) Final Cytosine Methylation Report ================================= Total number of C's analysed: 231335 Total methylated C's in CpG context: 52326 Total methylated C's in CHG context: 1132 Total methylated C's in CHH context: 2039 Total methylated C's in Unknown context: 0 Total unmethylated C's in CpG context: 15754 Total unmethylated C's in CHG context: 56443 Total unmethylated C's in CHH context: 103641 Total unmethylated C's in Unknown context: 0 C methylated in CpG context: 76.9% C methylated in CHG context: 2.0% C methylated in CHH context: 1.9% Can't determine percentage of methylated Cs in unknown context (CN or CHN) if value was 0 Bismark completed in 0d 0h 0m 10s
test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bam Parameters used to extract methylation information: Bismark Extractor Version: v0.22.1 Bismark result file: single-end (SAM format) Output specified: comprehensive Processed 9238 lines in total Total number of methylation call strings processed: 9238 Final Cytosine Methylation Report ================================= Total number of C's analysed: 231335 Total methylated C's in CpG context: 52326 Total methylated C's in CHG context: 1132 Total methylated C's in CHH context: 2039 Total C to T conversions in CpG context: 15754 Total C to T conversions in CHG context: 56443 Total C to T conversions in CHH context: 103641 C methylated in CpG context: 76.9% C methylated in CHG context: 2.0% C methylated in CHH context: 1.9%
不同类型的 C 碱基(CG、CHG 和 CHH), 其甲基化水平在不同物种间,甚至同一物种不同细胞类型不同条件下其甲基化水平都存在差。计每种类型(CG、CHG 和 CHH)甲基化 C 的甲基化水平分布,反应了该物种 DNA 甲基化特征。
甲基化 C 的甲基化水平
图中x轴表示甲基化水平,y轴表示特定甲基化水平的胞嘧啶碱基在所有甲基化碱基中所占的比例
test2.myCpG.txt
library(methylKit) file.list=list( system.file("extdata", "test1.myCpG.txt", package = "methylKit"), system.file("extdata", "test2.myCpG.txt", package = "methylKit"), system.file("extdata", "control1.myCpG.txt", package = "methylKit"), system.file("extdata", "control2.myCpG.txt", package = "methylKit") ) myobj=methRead(file.list, sample.id=list("test1","test2","ctrl1","ctrl2"), assembly="hg18", treatment=c(1,1,0,0), context="CpG", mincov = 10) png(file="a.png") getMethylationStats(myobj[[2]],plot=TRUE,both.strands=FALSE) dev.off()
library(tidyverse) myCpG <- read_tsv(system.file("extdata", "test2.myCpG.txt", package = "methylKit")) png(file="a.png") hist(myCpG$freqC) dev.off()
library(methylKit) file.list=list( system.file("extdata", "test1.myCpG.txt", package = "methylKit"), system.file("extdata", "test2.myCpG.txt", package = "methylKit"), system.file("extdata", "control1.myCpG.txt", package = "methylKit"), system.file("extdata", "control2.myCpG.txt", package = "methylKit") ) myobj=methRead(file.list, sample.id=list("test1","test2","ctrl1","ctrl2"), assembly="hg18", treatment=c(1,1,0,0), context="CpG", mincov = 10) png(file="a.png") getCoverageStats(myobj[[2]],plot=TRUE,both.strands=F) dev.off()
library(tidyverse) myCpG <- read_tsv(system.file("extdata", "test2.myCpG.txt", package = "methylKit")) png(file="a.png") hist(log10(myCpG$coverage+1)) dev.off()
如果我们的样本受到PCR偏倚的影响,那么丢弃具有非常高读覆盖率的碱基将是有用的。此外,我们还希望丢弃读覆盖率低的碱基,足够高的读覆盖率将增加统计测试的能力。
filtered.myobj=filterByCoverage(myobj,lo.count=10,lo.perc=NULL, hi.count=NULL,hi.perc=99.9)
上面的代码过滤一个methylRawList并丢弃覆盖率低于10X的碱基,同时丢弃每个样本中覆盖率超过99.9%的碱基。
为了做进一步的分析,我们需要得到所有样本中包含的碱基。以下函数将所有样本合并到一个对象中,用于所有样本中包含的基对位置。设置destrand=TRUE(默认值为FALSE)将合并CpG二核苷酸两条链上的读取。这提供了更好的覆盖范围,但仅在观察CpG甲基化时建议(对于CpH甲基化,这将导致后续分析中的错误结果)。此外,设置destrand=TRUE仅在碱基对分辨率上操作时有效,否则设置此选项TRUE将无效。unite()函数将返回一个methylBase对象,它将是所有比较分析的主要对象。methylBase对象包含所有样本中包含的区域/碱基的甲基化信息。
unite()
methylBase
meth=methylKit::unite(myobj, destrand=FALSE)
DMR检测使用权威期刊发表的metilene软件