展开

Methylation calling/extraction

最后发布时间 : 2023-03-24 15:01:19 浏览量 :

Bismark output discriminates between cytosines in CpG, CHG and CHH context

Strand-specific methylation output files (default):

  • OT:original top;
  • CTOT:complementary to OT
  • OB:original bottom;
  • CTOB:complementary to OB

OT和CTOT的甲基化调用将提供原始顶部链上的胞嘧啶甲基化位置的信息,OB和CTOB的调用将提供初始底部链上的甲基化位置信息。

Note

请注意,在Bismark对齐步骤中指定--directional(默认模式)选项不会向CTOT或CTOB线束报告任何对齐。

由于胞嘧啶可以存在于三种不同的序列上下文(CpG、CHG或CHH)中的任何一种中,bismark_methylation_extractor默认输出每个输入文件将产生12个单独输出文件(CpG_OT_…、CpG_CTOT_…、CpG_OB_…等)。

bismark_methylation_extractor  \
    --report \
    --buffer_size 8G \
    --cytosine_report \
    --gzip  \
    --bedGraph  \
    --paired-end \
    --multicore 3  \
    --genome_folder data2/index/ \
    -s mapped/test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bam \
    --output methylation_extracted 2> logs/bismark/test_MappedOn_NC010473.methylation_extractor.log

bismark_methylation_extractor主要产生以下字符串结尾的文件

  • .bedGraph.gz
  • .bismark.cov.gz
  • .M-bias.txt
  • _splitting_report.txt
methylation_extracted
├──[CHG_CTOB_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[CHG_CTOT_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CHG_OB_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CHG_OT_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CHH_CTOB_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CHH_CTOT_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CHH_OB_test_]MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CHH_OT_test_]MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CpG_CTOB_test_]MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CpG_CTOT_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CpG_OB_test_]MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CpG_OT_test_]MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted[.bedGraph.gz]
├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted[.bismark.cov.gz]
├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted[.M-bias.txt]
└── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted[_splitting_report.txt]

Context-dependent methylation output files (--comprehensive option):

如果对链特异性甲基化不感兴趣,所有可用的甲基化信息都可以合并到一个上下文相关的文件中(来自四条链中任何一条的信息都将合并)。这将默认为三个输出文件(CpG-context and Non-CpG-context),如果选择了--merge_Non_CpG则会生成两个输出文件(请注意,这可能会导致非CpG输出的文件大小过大)。

bismark_methylation_extractor  \
    --comprehensive \
    --report \
    --buffer_size 8G \
    --cytosine_report \
    --gzip  \
    --bedGraph  \
    --paired-end \
    --multicore 3  \
    --genome_folder data2/index/ \
    -s mapped/test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bam \
    --output methylation_extracted 2> logs/bismark/test_MappedOn_NC010473.methylation_extractor.log
methylation_extracted
├── CHG_context_test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├── CHH_context_test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├── CpG_context_test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bedGraph.gz
├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bismark.cov.gz
├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.M-bias.txt
└── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted_splitting_report.txt

Cytosine Methylation after Extraction

test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted_splitting_report.txt

test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bam

Parameters used to extract methylation information:
Bismark Extractor Version: v0.24.0
Bismark result file: paired-end (SAM format)
Ignoring first 2 bp of Read 2
Output specified: strand-specific (default)
No overlapping methylation calls specified


Processed 461 lines in total
Total number of methylation call strings processed: 922

Final Cytosine Methylation Report
=================================
Total number of C's analysed:   16852

Total methylated C's in CpG context:    340
Total methylated C's in CHG context:    240
Total methylated C's in CHH context:    988

Total C to T conversions in CpG context:        192
Total C to T conversions in CHG context:        2974
Total C to T conversions in CHH context:        12118

C methylated in CpG context:    63.9%
C methylated in CHG context:    7.5%
C methylated in CHH context:    7.5%

生信小木屋

M-Bias Plot

test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.M-bias.txt

CpG context
===========
positioncount methylatedcount unmethylated% methylationcoverage
1520150520/(520+150)=77.61670
252017774.61697
CHG context
===========
positioncount methylatedcount unmethylated% methylationcoverage
1115571.94568
2115531.95564
CHH context
===========
positioncount methylatedcount unmethylated% methylationcoverage
11910521.771071
22310102.231033

生信小木屋

其它输出文件

CHG_context_test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz

Bismark methylation extractor version v0.22.1
seq-IDmethylation statechromosomestart position (= end position)methylation call
3941_Ecoli_K12:4686069-4686316_R1-Ecoli_K1275x
3941_Ecoli_K12:4686069-4686316_R1-Ecoli_K1280x
3941_Ecoli_K12:4686069-4686316_R1-Ecoli_K1288x
3941_Ecoli_K12:4686069-4686316_R1-Ecoli_K12158x
3941_Ecoli_K12:4686069-4686316_R1-Ecoli_K12162x
3941_Ecoli_K12:4686069-4686316_R1-Ecoli_K12299x
2463_Ecoli_K12:4684504-4684753_R1-Ecoli_K121640x
3941_Ecoli_K12:4686069-4686316_R1-Ecoli_K12294x
3941_Ecoli_K12:4686069-4686316_R1-Ecoli_K12290x
  • z - C in CpG context - unmethylated
  • Z - C in CpG context - methylated
  • x - C in CHG context - unmethylated
  • X - C in CHG context - methylated
  • h - C in CHH context - unmethylated
  • H - C in CHH context - methylated
  • u - C in Unknown context (CN or CHN) - unmethylated
  • U - C in Unknown context (CN or CHN) - methylated
  • . - not a C or irrelevant position

test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bedGraph.gz

track type=bedGraph
chromosomestart positionend positionmethylation percentage
Ecoli_K129091100
Ecoli_K12152153100
Ecoli_K122432440
Ecoli_K12248249100
Ecoli_K12256257100
Ecoli_K12258259100
Ecoli_K12283284100
Ecoli_K12297298100
Ecoli_K123113120

test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bismark.cov.gz

chromosomestart positionend positionmethylation percentagecount methylatedcount unmethylated
Ecoli_K1215315310010
Ecoli_K12244244001
Ecoli_K1224924910010
Ecoli_K1225725710010
Ecoli_K1225925910010
Ecoli_K1228428410010
Ecoli_K1229829810010
Ecoli_K12312312001
Ecoli_K121645164510010

甲基化水平计算

甲基化水平可根据未转化为 T 的 C 与转化为 T 的 C 的 reads 的比例计算得到,即:
Beta-value = C-reads / (C-reads + T-reads) * 100%
其中,Beta-value 即为该胞嘧啶的甲基化水平,C-reads 为覆盖该位点的支持甲基化的reads 数目(测得该位点为 C 的 reads),T-reads 为覆盖该位点的不支持甲基化的 reads 数目(测得该位点为 T 的 reads)。计算原理示意图如下:

生信小木屋

bismark_methylation_extractor结果解释

生信小木屋

生信小木屋

Optional bedGraph output

Note

By default, this mode will only consider cytosines in CpG context, but it can be extended to cytosines in any sequence context by using the option --CX

bismark2bedGraph  \
    --CX methylation_extracted/CpG_context_test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz \
    -o test_MappedOn_NC010473_CpG --dir coverage 
coverage
├── test_MappedOn_NC010473_CpG
└── test_MappedOn_NC010473_CpG.gz.bismark.cov

test_MappedOn_NC010473_CpG

track type=bedGraph
chromosomestart positionend positionmethylation percentage
Ecoli_K129091100
Ecoli_K12152153100
Ecoli_K122432440
Ecoli_K12248249100
Ecoli_K12256257100
Ecoli_K12258259100
Ecoli_K12283284100

test_MappedOn_NC010473_CpG.gz.bismark.cov

chromosomestart positionend positionmethylation percentagecount methylatedcount unmethylated
Ecoli_K1215315310010
Ecoli_K12244244001
Ecoli_K1224924910010
Ecoli_K1225725710010
Ecoli_K1225925910010
Ecoli_K1228428410010
Ecoli_K1229829810010
Ecoli_K12312312001

Optional genome-wide cytosine report output

Bismark甲基化提取器还可以输出全基因组胞嘧啶甲基化报告。它也按染色体坐标排序,但也包含序列上下文

coverage2cytosine  --CX \
    --genome_folder data2/index/ \
    -o coverage/test_MappedOn_NC010473_CpG \
    --dir . coverage/test_MappedOn_NC010473_CpG 
chromosomepositionstrandcount methylatedcount unmethylatedCHHCTT
Ecoli_K128+00CHHCAT
Ecoli_K1212+00CHGCTG
Ecoli_K1214-00CHGCAG
Ecoli_K1216+00CHGCTG
Ecoli_K1218-00CHGCAG
Ecoli_K1219+00CHHCAA
Ecoli_K1222+00CGCGG
Ecoli_K1223-00CGCGT
Ecoli_K1224-00CHGCCG