CNV检测-DECoN

Publish

原文链接
PDF: Accurate clinical detection of exon copy number variants in a targeted NGS panel using DECoN

DECoN v1.0.0 Documentation

DECoN软件Git

背景

二代靶向测序(NGS)芯片越来越多地应用于临床基因组学,以提高基因检测的能力、通量和降低检测成本。在靶向捕获测序芯片中进行整个外显子的缺失和扩增是非常有挑战,特别是针对单个外显子的CNV检测。
本文献提供了一个针对外显子靶向捕获数据进行外显子水平拷贝数变异的工具(DECoN)。
累计用到2016例样本,其中96例(10 samples with exon CNVs in BRCA1, six with exon CNVs in BRCA2, and 15 samples with exon CNVs in one of eight other genes: TP53, SDHB, MLH1, MSH2, NSD1, EZH2, WT1 and FH)评估集合,和1920临床测试集合;样本使用的是淋巴瘤的外周血和唾液。

Method

DECoN(Detection of Exon Copy Number ) 是基于ExomeDepth (v.1.0.0) 进行开发修改的,其中有两个比较重要的改动。

  1. 可以检测染色体上第一个(Bed文件中定义的)外显子发生的变异,之前的版本不支持;
  2. HMM转移矩阵的概率是基于外显子的距离计算的 ,在这里如果两个外显子在染色体上的距离太远的话,那他们将作为两个独立的变异进行处理。
    这部分升级也在后面应用到ExomeDepth (v.1.1.0) 了。同时DECoN还增加了一些其他的功能,例如对依赖包进行标准化,保证临床应用过程中,不同实验室的结果一致性。

4.1 Reading BAM files to generate coverage metric

DECoN的输入是一系列的Bam文件列表和一个bed文件(描述需要计算覆盖度矩阵的exon区域),然后去计算Bed文件中的每个外显子的FPKM(fragment per kilobase and million base pairs ) 获得每个外显子的覆盖度矩阵。 DECoN使用这个矩阵进行外显子层面的CNV检测. FPKM 的计算方式如下:
FPKM = C/(N*L)
其中C是map到目标外显子的Reads数目(单位条);
其中N是一个样本能比对到基因组上的总Reads数目(单位Million)
其中L是目标外显子的碱基长度(单位base)

eg: For example, consider a sample with a total of 20 million mapped read pairs of which 200 map to an exon which is 100 bases long: FPKM = 200/(20*0.1)
Thus FPKM for this exon in this sample is 100.

4.2 Running quality checks

外显子和样本的评估都是基于它们的平均覆盖率水平。当覆盖率较低时,检测的准确性将受到影响,因此在解释结果时应谨慎行事。

样本也根据它们与其他样本的相关性进行评估。如果样本与集合中其他样本的相关性不高,则很可能在整个目标中出现次优检测。下面给出了支持此质量标志的建议默认阈值。

  • Minimum correlation threshold
    the minimum correlation between a test sample and any other sample for the test sample to be considered well-correlated. The default value is 0.98.
  • Minimum coverage threshold
    the minimum median coverage for any sample (measured across all exons in the target) or exon (measured across all samples) to be considered well-covered. The default value is 100.

Calling exon CNVs

The HMM transition probabilities are altered from ExomeDepth v1.0.0. to depend upon the distance between exons, so that exons adjacent in the list of targeted regions are treated independently if they are located so far apart on the chromosome that the probability of a germline variant spanning both exons is negligible, specifically:

  • The probability of transitioning into a CNV state (from normal to deletion or from normal to duplication) is given by a constant transition probability specified by the user (set as default to .01).

  • The probability of transitioning to a normal state from a CNV state (from deletion to normal or from duplication to normal) is given by a baseline probability scaled by the distance between exons. If the distance between these exons is 0, then this scaling factor is simply 1, but as the distance increases, the scaling factor tends to 0. This is given by

    **exp(−𝑙𝐸)∗1/𝑡**
    

    where 𝑙 is the distance from the previous exon; E is the expected CNV length in basepairs; and t is the baseline probability of returning to a normal state from a deletion/duplication. These values are set as E=50000 and t=.5.

软件安装

测试过程发现,DeCon在R3.6.1版本,会存在较多兼容性问题,建议按官方推荐使用R3.1.2

1
conda  create -n r3.1.2 -c conda-forge r-base=3.1.2
-------------本文结束感谢您的阅读-------------