Bismark output discriminates between cytosines in CpG, CHG and CHH context
OT和CTOT的甲基化调用将提供原始顶部链上的胞嘧啶甲基化位置的信息,OB和CTOB的调用将提供初始底部链上的甲基化位置信息。
请注意,在Bismark对齐步骤中指定--directional(默认模式)选项不会向CTOT或CTOB线束报告任何对齐。
由于胞嘧啶可以存在于三种不同的序列上下文(CpG、CHG或CHH)中的任何一种中,bismark_methylation_extractor默认输出每个输入文件将产生12个单独输出文件(CpG_OT_…、CpG_CTOT_…、CpG_OB_…等)。
bismark_methylation_extractor
bismark_methylation_extractor \ --report \ --buffer_size 8G \ --cytosine_report \ --gzip \ --bedGraph \ --paired-end \ --multicore 3 \ --genome_folder data2/index/ \ -s mapped/test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bam \ --output methylation_extracted 2> logs/bismark/test_MappedOn_NC010473.methylation_extractor.log
bismark_methylation_extractor主要产生以下字符串结尾的文件
methylation_extracted ├──[CHG_CTOB_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz ├──[CHG_CTOT_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz ├──[ CHG_OB_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz ├──[ CHG_OT_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz ├──[ CHH_CTOB_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz ├──[ CHH_CTOT_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz ├──[ CHH_OB_test_]MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz ├──[ CHH_OT_test_]MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz ├──[ CpG_CTOB_test_]MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz ├──[ CpG_CTOT_]test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz ├──[ CpG_OB_test_]MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz ├──[ CpG_OT_test_]MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz ├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted[.bedGraph.gz] ├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted[.bismark.cov.gz] ├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted[.M-bias.txt] └── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted[_splitting_report.txt]
如果对链特异性甲基化不感兴趣,所有可用的甲基化信息都可以合并到一个上下文相关的文件中(来自四条链中任何一条的信息都将合并)。这将默认为三个输出文件(CpG-context and Non-CpG-context),如果选择了--merge_Non_CpG则会生成两个输出文件(请注意,这可能会导致非CpG输出的文件大小过大)。
--merge_Non_CpG
bismark_methylation_extractor \ --comprehensive \ --report \ --buffer_size 8G \ --cytosine_report \ --gzip \ --bedGraph \ --paired-end \ --multicore 3 \ --genome_folder data2/index/ \ -s mapped/test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bam \ --output methylation_extracted 2> logs/bismark/test_MappedOn_NC010473.methylation_extractor.log
methylation_extracted ├── CHG_context_test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz ├── CHH_context_test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz ├── CpG_context_test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz ├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bedGraph.gz ├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bismark.cov.gz ├── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.M-bias.txt └── test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted_splitting_report.txt
test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted_splitting_report.txt
sorted_splitting_report.txt
test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bam Parameters used to extract methylation information: Bismark Extractor Version: v0.24.0 Bismark result file: paired-end (SAM format) Ignoring first 2 bp of Read 2 Output specified: strand-specific (default) No overlapping methylation calls specified Processed 461 lines in total Total number of methylation call strings processed: 922 Final Cytosine Methylation Report ================================= Total number of C's analysed: 16852 Total methylated C's in CpG context: 340 Total methylated C's in CHG context: 240 Total methylated C's in CHH context: 988 Total C to T conversions in CpG context: 192 Total C to T conversions in CHG context: 2974 Total C to T conversions in CHH context: 12118 C methylated in CpG context: 63.9% C methylated in CHG context: 7.5% C methylated in CHH context: 7.5%
test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.M-bias.txt
sorted.M-bias.txt
CHG_context_test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz
CHG_context_
test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bedGraph.gz
bedGraph.gz
test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.bismark.cov.gz
bismark.cov.gz
甲基化水平可根据未转化为 T 的 C 与转化为 T 的 C 的 reads 的比例计算得到,即:Beta-value = C-reads / (C-reads + T-reads) * 100%其中,Beta-value 即为该胞嘧啶的甲基化水平,C-reads 为覆盖该位点的支持甲基化的reads 数目(测得该位点为 C 的 reads),T-reads 为覆盖该位点的不支持甲基化的 reads 数目(测得该位点为 T 的 reads)。计算原理示意图如下:
By default, this mode will only consider cytosines in CpG context, but it can be extended to cytosines in any sequence context by using the option --CX
--CX
bismark2bedGraph \ --CX methylation_extracted/CpG_context_test_MappedOn_NC010473_trim_bismark_pe.deduplicated.sorted.txt.gz \ -o test_MappedOn_NC010473_CpG --dir coverage
coverage ├── test_MappedOn_NC010473_CpG └── test_MappedOn_NC010473_CpG.gz.bismark.cov
test_MappedOn_NC010473_CpG
test_MappedOn_NC010473_CpG.gz.bismark.cov
Bismark甲基化提取器还可以输出全基因组胞嘧啶甲基化报告。它也按染色体坐标排序,但也包含序列上下文
coverage2cytosine --CX \ --genome_folder data2/index/ \ -o coverage/test_MappedOn_NC010473_CpG \ --dir . coverage/test_MappedOn_NC010473_CpG