基因注释软件-VEP

在使用VEP软件进行功能注释阶段,其实会发现存在部分注释结果和整体注释逻辑存在偏差。由于不清楚是认识的不足还是VEP本身代码存在Bug。因此对VEP的源码进行溯源核查。

相关疑似的问题

  • 部分变异检测结果存在cHGVS,且cHGVS位于编码区,但是未输出pHGVS信息。
Chr Start End Ref Alt Gene Trans cHGVS pHGVS
chr12 122064779 122064785 ACCGCCA C ORAI1 NM_032790.3 c.132_138delinsC -
  • 从注释的氨基酸结果来看,不涉及终止密码子,但是注释得到的Function为stop_lost
Chr Start End Ref Alt Gene Trans cHGVS pHGVS
chr1 10342506 10342555 TCAGGTGGGCTTGACGTCTGTGACCAGTATTCAAGAGAGGATCATGTCTA CCAGGTGTAGACATGATCCTCTCTTGAATACTGGTCACAGACGTCAAGCC KIF1B NM_015074.3 c.1211_1260delinsCCAGGTGTAGACATGATCCTCTCTTGAATACTGGTCACAGACGTCAAGCC p.I404_L420delinsTRCRHDPLLNTGHRRQA

源码重点部分记录

  • VEP输出结果整体记录在 “ modules\Bio\EnsEMBL\VEP\OutputFactory.pm ” 中,如果是处于核查可以基于该模块进行反向溯源;

  • 预测氨基酸变化
    代码在 “modules\Bio\EnsEMBL\Variation\TranscriptVariationAllele.pm” Line:684

    1
    2
    3
    4
    5
    6
    7
    8
    =head2 peptide

    Description: Return the amino acid sequence that this allele is predicted to result in
    Returntype : string or undef if this allele is not in the CDS or is a frameshift
    Exceptions : none
    Status : Stable

    =cut
  • 对输入文件进行解析校验

modules\Bio\EnsEMBL\VEP\Parser.pm

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
=head2 validate_vf

Arg 1 : Bio::EnsEMBL::Variation::BaseVariationFeature
Example : $is_valid = $parser->validate_vf($vf);
Description: Performs various (configurable) checks on a VariationFeature
as produced by the parser:
- creates a variation_name from the location+alleles if none
- checks if chr is in user-specified list if provided
- checks start and end look like numbers
- checks start/end are valid (start <= end + 1) and corresponds to ref allele
- checks chr is in valid list
- checks ref allele vs genome if requested
Returntype : bool
Exceptions : none
Caller : next()
Status : Stable

=cut

  • HGVS信息注释
    • 获取变异最接近的转录本
      modules\Bio\EnsEMBL\VEP\AnnotationType\Transcript.pm
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
       =head2 get_nearest

      Arg 1 : Bio::EnsEMBL::Variation::BaseVariationFeature $vf
      Arg 2 : string $type (transcript, gene or symbol)
      Example : $nearest_gene = $as->get_nearest($vf, 'symbol');
      Description: Gets ID (one of transcript, gene, symbol) of the nearest
      transcript to the given variant.
      Returntype : string
      Exceptions : none
      Caller : annotate_InputBuffer()
      Status : Stable
      =cut

对应代码: modules\Bio\EnsEMBL\Variation\TranscriptVariation.pm

sub _hgvs_generic {
    my $self = shift;
    my $reference = pop;
    my $hgvs = shift;

    #The rna and mitochondrial modes have not yet been implemented, so return undef in case we get a call to these
    return undef if ($reference =~ m/rna|mitochondrial/);

    # The HGVS subroutine
    my $sub = qq{hgvs_$reference};

    # Loop over the TranscriptVariationAllele objects associated with this TranscriptVariation
    foreach my $tv_allele (@{ $self->get_all_alternate_TranscriptVariationAlleles }) {

        #If an HGVS hash was supplied and the allele exists as key, set the HGVS notation for this allele
        if (defined($hgvs)) {
            my $notation = $hgvs->{$tv_allele->variation_feature_seq()};
            $tv_allele->$sub($notation) if defined $notation;
        }
        # Else, add the HGVS notation for this allele to the HGVS hash
        else {
        $hgvs->{$tv_allele->variation_feature_seq()} = $tv_allele->$sub();
        }
    }

    return $hgvs;
}

modules\Bio\EnsEMBL\VEP\VariantRecoder.pm

-------------本文结束感谢您的阅读-------------