Somatic SNV Indel
Call somatic SNVs and indels and generate a BAMOUT
gatk Mutect2 \
-R $baseDir/ref/Homo_sapiens_assembly38.fasta \
-I $baseDir/bams/tumor.bam \
-I $baseDir/bams/normal.bam \
-tumor HCC1143_tumor \
-normal HCC1143_normal \
-pon $baseDir/resources/chr17_m2pon.vcf.gz \
--af-of-alleles-not-in-resource 0.0000025 \
--germline-resource $baseDir/resources/chr17_af-only-gnomad_grch38.vcf.gz \
-L $baseDir/resources/chr17plus.interval_list \
-O $baseDir/sandbox/1_somatic_m2.vcf.gz \
-bamout $baseDir/sandbox/2_tumor_normal_m2.bam
--input,-I
(GATKPath): BAM/SAM/CRAM file containing reads This argument must be specified at least once. Required.--output,-O
(GATKPath): File to which variants should be written Required.--reference,-R
(GATKPath): Reference sequence file Required.--tumor-sample,-tumor
: 此参数已弃用(此功能已弃用,将在将来的版本中删除。)。肿瘤BAM样本名称,可以由GetSampleName使用-encode参数将URL编码为输出。默认值:null。--normal-sample,-normal
: 正常BAM样本名称, 可以由GetSampleName使用-encode参数将URL编码为输出。此参数可以指定0次或多次。默认值:null。--panel-of-normals,-pon
(FeatureInput): VCF file of sites observed in normal. Default value: null.--germline-resource
(FeatureInput): Population vcf of germline sequencing containing allele fractions. Default value: null.--intervals,-L
:One or more genomic intervals over which to operate This argument may be specified 0 or more times. Default value: null.--bam-output,-bamout
: File to which assembled haplotypes should be written Default value: null.
Call somatic mutations using GATK4 Mutect2
通过--germline-resource
指定population germline变异的注释。population germline必须包含allele-specific frequencies。必须包含AF的注释在vcf文件的INFO列。Mutect2使用population allele frequencies注释等位基因的变异。当使用population germline时,考虑将参数--af-of-alleles-not-in-resource
从默认值0.001进行调整。例如,gnomAD的文件af-only-gnomad_grch38.vcf.gz
代表represents ~200k exomes 和 ~16k genomes,上述教程使用的数据/chr17_af-only-gnomad_grch38.vcf.gz
是外显子数据,因此我们调整--af-of-alleles-not-in-resource
为0.0000025,在对应于1/(2exome samples)=1/(2200,000)。默认的0.001适用于没有任何population resource的人类样本分析,它是基于人类平均的杂合率。population allele frequencies(POP_AF
)和af-of-alleles-not-in-resource
factor在体细胞变异的概率计算中。
Mutect2 is based on HaplotypeCaller
- Skip :
- Sites in PoN
- Sites with high fraction of alt reads in normal
- Allele-specific calling
- Distinguishes alleles in the germline population frequency resource and uses AF in calculating probability variant exists in normal and tumor
MuTect2 reassembly recovers the 120 base deletion haplotype
Somatic calls inferred from PairHMM likelihoods
- 没有明确的倍性(ploidy)假设(与单倍型不同的假设)
- 体细胞“likelihoods”使用 variant allele fraction而不是倍性(ploidy)
- somatic call 的统计阈值使用 log-likelihood ratios
-
= 5.3 是有利于体细胞变异的基因型
-
Multiallelic calling in GATK4 Mutect2
Filtering is based on annotations + contamination estimate
FilterMutectCalls filters for multiple criteria
New Contamination Model in GATK4
存在的问题
肿瘤的异质性
If you don't have any normals. You can still run the pipeline's but you might get lots of false positives.
if you ware just trying to pop the filter out common variants you could use something like panel normal. The panel normals is really helpful for removing sequencing.
https://gatk.broadinstitute.org/hc/en-us/articles/360035894731-Somatic-short-variant-discovery-SNVs-Indels-