展开

Somatic SNV Indel

最后发布时间 : 2023-04-11 23:33:45 浏览量 :

生信小木屋

Call somatic SNVs and indels and generate a BAMOUT

gatk Mutect2 \
     -R $baseDir/ref/Homo_sapiens_assembly38.fasta \
     -I $baseDir/bams/tumor.bam \
     -I $baseDir/bams/normal.bam \
     -tumor HCC1143_tumor \
     -normal HCC1143_normal \
     -pon $baseDir/resources/chr17_m2pon.vcf.gz \
     --af-of-alleles-not-in-resource 0.0000025 \
     --germline-resource $baseDir/resources/chr17_af-only-gnomad_grch38.vcf.gz \
     -L $baseDir/resources/chr17plus.interval_list \
     -O $baseDir/sandbox/1_somatic_m2.vcf.gz \
     -bamout $baseDir/sandbox/2_tumor_normal_m2.bam
  • --input,-I(GATKPath): BAM/SAM/CRAM file containing reads This argument must be specified at least once. Required.
  • --output,-O(GATKPath): File to which variants should be written Required.
  • --reference,-R(GATKPath): Reference sequence file Required.
  • --tumor-sample,-tumor: 此参数已弃用(此功能已弃用,将在将来的版本中删除。)。肿瘤BAM样本名称,可以由GetSampleName使用-encode参数将URL编码为输出。默认值:null。
  • --normal-sample,-normal: 正常BAM样本名称, 可以由GetSampleName使用-encode参数将URL编码为输出。此参数可以指定0次或多次。默认值:null。
  • --panel-of-normals,-pon (FeatureInput): VCF file of sites observed in normal. Default value: null.
  • --germline-resource(FeatureInput): Population vcf of germline sequencing containing allele fractions. Default value: null.
  • --intervals,-L:One or more genomic intervals over which to operate This argument may be specified 0 or more times. Default value: null.
  • --bam-output,-bamout: File to which assembled haplotypes should be written Default value: null.

Call somatic mutations using GATK4 Mutect2
通过--germline-resource指定population germline变异的注释。population germline必须包含allele-specific frequencies。必须包含AF的注释在vcf文件的INFO列。Mutect2使用population allele frequencies注释等位基因的变异。当使用population germline时,考虑将参数--af-of-alleles-not-in-resource从默认值0.001进行调整。例如,gnomAD的文件af-only-gnomad_grch38.vcf.gz代表represents ~200k exomes 和 ~16k genomes,上述教程使用的数据/chr17_af-only-gnomad_grch38.vcf.gz是外显子数据,因此我们调整--af-of-alleles-not-in-resource为0.0000025,在对应于1/(2exome samples)=1/(2200,000)。默认的0.001适用于没有任何population resource的人类样本分析,它是基于人类平均的杂合率。population allele frequencies(POP_AF)和af-of-alleles-not-in-resourcefactor在体细胞变异的概率计算中。

Mutect2 is based on HaplotypeCaller

生信小木屋

  • Skip :
    • Sites in PoN
    • Sites with high fraction of alt reads in normal
  • Allele-specific calling
    • Distinguishes alleles in the germline population frequency resource and uses AF in calculating probability variant exists in normal and tumor

MuTect2 reassembly recovers the 120 base deletion haplotype

生信小木屋
生信小木屋

Somatic calls inferred from PairHMM likelihoods

生信小木屋

  • 没有明确的倍性(ploidy)假设(与单倍型不同的假设)
    • 体细胞“likelihoods”使用 variant allele fraction而不是倍性(ploidy)
  • somatic call 的统计阈值使用 log-likelihood ratios
    • = 5.3 是有利于体细胞变异的基因型

Multiallelic calling in GATK4 Mutect2

Filtering is based on annotations + contamination estimate

生信小木屋

FilterMutectCalls filters for multiple criteria
生信小木屋

New Contamination Model in GATK4

存在的问题

肿瘤的异质性

生信小木屋

生信小木屋

If you don't have any normals. You can still run the pipeline's but you might get lots of false positives.
if you ware just trying to pop the filter out common variants you could use something like panel normal. The panel normals is really helpful for removing sequencing.

https://gatk.broadinstitute.org/hc/en-us/articles/360035894731-Somatic-short-variant-discovery-SNVs-Indels-