GATK4初探 - 1

germline somatic detail

GATK4 简介;

参考地址

Workflow
WDL语法
Best Practices

GATK4.0的改动

  • Re-engineered for speed、 scalability and versatillity
  • Expanded scope of analysis to more variant types
  • Reproducible best practices worlflows

    GATK4。0 协议

    Under BSD3.0

streamlined arhciecture(overall efficlency)
Intel Genomics Kernel Library (speed)
Intel GenomicsDB (scalability)
Apache Spark support(robust parallelism)
Google Dataproc and GCS support(cloud execution)
Versatility of data traversal (analysis scpe)

变异检出

Geretic changes in individuals relative to a reference genome

  • Germline(inherited)
  • Somatic(cancer)

Reference genome= a standardized genomic sequence

Human genome reference sequence

  • Previous standard hg19/b37
  • New sandard : hg38

变异检出

造成干扰的原因: 噪音、污染、纯度、

GATK4 对应不同变异检出的程序:

Germline SOMATIC
SNPs&Indel HaplotypeCaller GVCF MuTect2
CNV GATK gCNV(beta) GATK CNV +aCNV
Structure GATK SVDiscovery(beta) Planned

GATK bestPractices

单样本变异检出算法

Mutect 支持但样本的SNV检出(假阳性多)

GATK workflows

Github
流程Script使用WDL编写

变异检测过程;

Step1 Identify ActiveRegions

-------------本文结束感谢您的阅读-------------