Methylation calling/extraction
Bismark output discriminates between cytosines in CpG, CHG and CHH context
Strand-specific methylation output files (default):
- OT:original top;
- CTOT:complementary to OT
- OB:original bottom;
- CTOB:complementary to OB
OT和CTOT的甲基化调用将提供原始顶部链上的胞嘧啶甲基化位置的信息,OB和CTOB的调用将提供初始底部链上的甲基化位置信息。
请注意,在Bismark对齐步骤中指定--directional(默认模式)选项不会向CTOT或CTOB线束报告任何对齐。
由于胞嘧啶可以存在于三种不同的序列上下文(CpG、CHG或CHH)中的任何一种中,bismark_methylation_extractor
默认输出每个输入文件将产生12个单独输出文件(CpG_OT_…、CpG_CTOT_…、CpG_OB_…等)。
bismark_methylation_extractor \
--report \
--buffer_size 8G \
--cytosine_report \
--gzip \
--bedGraph \
--paired-end \
--multicore 3 \
--genome_folder data2/index/ \
-s mapped/test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bam \
--output methylation_extracted 2> logs/bismark/test_MappedOn_NC010473.methylation_extractor.log
bismark_methylation_extractor
主要产生以下字符串结尾的文件
- .bedGraph.gz
- .bismark.cov.gz
- .M-bias.txt
- _splitting_report.txt
methylation_extracted
├──[CHG_CTOB_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[CHG_CTOT_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CHG_OB_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CHG_OT_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CHH_CTOB_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CHH_CTOT_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CHH_OB_test_]MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CHH_OT_test_]MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CpG_CTOB_test_]MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CpG_CTOT_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CpG_OB_test_]MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├──[ CpG_OT_test_]MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted[.bedGraph.gz]
├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted[.bismark.cov.gz]
├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted[.M-bias.txt]
└── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted[_splitting_report.txt]
Context-dependent methylation output files (--comprehensive option):
如果对链特异性甲基化不感兴趣,所有可用的甲基化信息都可以合并到一个上下文相关的文件中(来自四条链中任何一条的信息都将合并)。这将默认为三个输出文件(CpG-context and Non-CpG-context),如果选择了--merge_Non_CpG
则会生成两个输出文件(请注意,这可能会导致非CpG输出的文件大小过大)。
bismark_methylation_extractor \
--comprehensive \
--report \
--buffer_size 8G \
--cytosine_report \
--gzip \
--bedGraph \
--paired-end \
--multicore 3 \
--genome_folder data2/index/ \
-s mapped/test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bam \
--output methylation_extracted 2> logs/bismark/test_MappedOn_NC010473.methylation_extractor.log
methylation_extracted
├── CHG_context_test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├── CHH_context_test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├── CpG_context_test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bedGraph.gz
├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bismark.cov.gz
├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.M-bias.txt
└── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted_splitting_report.txt
Cytosine Methylation after Extraction
test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted_splitting_report.txt
test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bam
Parameters used to extract methylation information:
Bismark Extractor Version: v0.24.0
Bismark result file: paired-end (SAM format)
Ignoring first 2 bp of Read 2
Output specified: strand-specific (default)
No overlapping methylation calls specified
Processed 461 lines in total
Total number of methylation call strings processed: 922
Final Cytosine Methylation Report
=================================
Total number of C's analysed: 16852
Total methylated C's in CpG context: 340
Total methylated C's in CHG context: 240
Total methylated C's in CHH context: 988
Total C to T conversions in CpG context: 192
Total C to T conversions in CHG context: 2974
Total C to T conversions in CHH context: 12118
C methylated in CpG context: 63.9%
C methylated in CHG context: 7.5%
C methylated in CHH context: 7.5%
M-Bias Plot
test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.M-bias.txt
CpG context | ||||
---|---|---|---|---|
=========== | ||||
position | count methylated | count unmethylated | % methylation | coverage |
1 | 520 | 150 | 520/(520+150)=77.61 | 670 |
2 | 520 | 177 | 74.61 | 697 |
CHG context | ||||
=========== | ||||
position | count methylated | count unmethylated | % methylation | coverage |
1 | 11 | 557 | 1.94 | 568 |
2 | 11 | 553 | 1.95 | 564 |
CHH context | ||||
=========== | ||||
position | count methylated | count unmethylated | % methylation | coverage |
1 | 19 | 1052 | 1.77 | 1071 |
2 | 23 | 1010 | 2.23 | 1033 |
其它输出文件
CHG_context_
test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
Bismark methylation extractor version v0.22.1 | ||||
---|---|---|---|---|
seq-ID | methylation state | chromosome | start position (= end position) | methylation call |
3941_Ecoli_K12:4686069-4686316_R1 | - | Ecoli_K12 | 75 | x |
3941_Ecoli_K12:4686069-4686316_R1 | - | Ecoli_K12 | 80 | x |
3941_Ecoli_K12:4686069-4686316_R1 | - | Ecoli_K12 | 88 | x |
3941_Ecoli_K12:4686069-4686316_R1 | - | Ecoli_K12 | 158 | x |
3941_Ecoli_K12:4686069-4686316_R1 | - | Ecoli_K12 | 162 | x |
3941_Ecoli_K12:4686069-4686316_R1 | - | Ecoli_K12 | 299 | x |
2463_Ecoli_K12:4684504-4684753_R1 | - | Ecoli_K12 | 1640 | x |
3941_Ecoli_K12:4686069-4686316_R1 | - | Ecoli_K12 | 294 | x |
3941_Ecoli_K12:4686069-4686316_R1 | - | Ecoli_K12 | 290 | x |
- z - C in CpG context - unmethylated
- Z - C in CpG context - methylated
- x - C in CHG context - unmethylated
- X - C in CHG context - methylated
- h - C in CHH context - unmethylated
- H - C in CHH context - methylated
- u - C in Unknown context (CN or CHN) - unmethylated
- U - C in Unknown context (CN or CHN) - methylated
- . - not a C or irrelevant position
test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bedGraph.gz
track type=bedGraph | |||
---|---|---|---|
chromosome | start position | end position | methylation percentage |
Ecoli_K12 | 90 | 91 | 100 |
Ecoli_K12 | 152 | 153 | 100 |
Ecoli_K12 | 243 | 244 | 0 |
Ecoli_K12 | 248 | 249 | 100 |
Ecoli_K12 | 256 | 257 | 100 |
Ecoli_K12 | 258 | 259 | 100 |
Ecoli_K12 | 283 | 284 | 100 |
Ecoli_K12 | 297 | 298 | 100 |
Ecoli_K12 | 311 | 312 | 0 |
test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bismark.cov.gz
chromosome | start position | end position | methylation percentage | count methylated | count unmethylated |
---|---|---|---|---|---|
Ecoli_K12 | 153 | 153 | 100 | 1 | 0 |
Ecoli_K12 | 244 | 244 | 0 | 0 | 1 |
Ecoli_K12 | 249 | 249 | 100 | 1 | 0 |
Ecoli_K12 | 257 | 257 | 100 | 1 | 0 |
Ecoli_K12 | 259 | 259 | 100 | 1 | 0 |
Ecoli_K12 | 284 | 284 | 100 | 1 | 0 |
Ecoli_K12 | 298 | 298 | 100 | 1 | 0 |
Ecoli_K12 | 312 | 312 | 0 | 0 | 1 |
Ecoli_K12 | 1645 | 1645 | 100 | 1 | 0 |
甲基化水平计算
甲基化水平可根据未转化为 T 的 C 与转化为 T 的 C 的 reads 的比例计算得到,即:
Beta-value = C-reads / (C-reads + T-reads) * 100%
其中,Beta-value 即为该胞嘧啶的甲基化水平,C-reads 为覆盖该位点的支持甲基化的reads 数目(测得该位点为 C 的 reads),T-reads 为覆盖该位点的不支持甲基化的 reads 数目(测得该位点为 T 的 reads)。计算原理示意图如下:
bismark_methylation_extractor结果解释
Optional bedGraph output
By default, this mode will only consider cytosines in CpG context, but it can be extended to cytosines in any sequence context by using the option --CX
bismark2bedGraph \
--CX methylation_extracted/CpG_context_test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz \
-o test_MappedOn_NC010473_CpG --dir coverage
coverage
├── test_MappedOn_NC010473_CpG
└── test_MappedOn_NC010473_CpG.gz.bismark.cov
test_MappedOn_NC010473_CpG
track type=bedGraph | |||
---|---|---|---|
chromosome | start position | end position | methylation percentage |
Ecoli_K12 | 90 | 91 | 100 |
Ecoli_K12 | 152 | 153 | 100 |
Ecoli_K12 | 243 | 244 | 0 |
Ecoli_K12 | 248 | 249 | 100 |
Ecoli_K12 | 256 | 257 | 100 |
Ecoli_K12 | 258 | 259 | 100 |
Ecoli_K12 | 283 | 284 | 100 |
test_MappedOn_NC010473_CpG.gz.bismark.cov
chromosome | start position | end position | methylation percentage | count methylated | count unmethylated |
---|---|---|---|---|---|
Ecoli_K12 | 153 | 153 | 100 | 1 | 0 |
Ecoli_K12 | 244 | 244 | 0 | 0 | 1 |
Ecoli_K12 | 249 | 249 | 100 | 1 | 0 |
Ecoli_K12 | 257 | 257 | 100 | 1 | 0 |
Ecoli_K12 | 259 | 259 | 100 | 1 | 0 |
Ecoli_K12 | 284 | 284 | 100 | 1 | 0 |
Ecoli_K12 | 298 | 298 | 100 | 1 | 0 |
Ecoli_K12 | 312 | 312 | 0 | 0 | 1 |
Optional genome-wide cytosine report output
Bismark甲基化提取器还可以输出全基因组胞嘧啶甲基化报告。它也按染色体坐标排序,但也包含序列上下文
coverage2cytosine --CX \
--genome_folder data2/index/ \
-o coverage/test_MappedOn_NC010473_CpG \
--dir . coverage/test_MappedOn_NC010473_CpG
chromosome | position | strand | count methylated | count unmethylated | CHH | CTT |
---|---|---|---|---|---|---|
Ecoli_K12 | 8 | + | 0 | 0 | CHH | CAT |
Ecoli_K12 | 12 | + | 0 | 0 | CHG | CTG |
Ecoli_K12 | 14 | - | 0 | 0 | CHG | CAG |
Ecoli_K12 | 16 | + | 0 | 0 | CHG | CTG |
Ecoli_K12 | 18 | - | 0 | 0 | CHG | CAG |
Ecoli_K12 | 19 | + | 0 | 0 | CHH | CAA |
Ecoli_K12 | 22 | + | 0 | 0 | CG | CGG |
Ecoli_K12 | 23 | - | 0 | 0 | CG | CGT |
Ecoli_K12 | 24 | - | 0 | 0 | CHG | CCG |