在使用VEP软件进行功能注释阶段,其实会发现存在部分注释结果和整体注释逻辑存在偏差。由于不清楚是认识的不足还是VEP本身代码存在Bug。因此对VEP的源码进行溯源核查。
相关疑似的问题
- 部分变异检测结果存在cHGVS,且cHGVS位于编码区,但是未输出pHGVS信息。
Chr | Start | End | Ref | Alt | Gene | Trans | cHGVS | pHGVS |
---|---|---|---|---|---|---|---|---|
chr12 | 122064779 | 122064785 | ACCGCCA | C | ORAI1 | NM_032790.3 | c.132_138delinsC | - |
- 从注释的氨基酸结果来看,不涉及终止密码子,但是注释得到的Function为stop_lost
Chr | Start | End | Ref | Alt | Gene | Trans | cHGVS | pHGVS |
---|---|---|---|---|---|---|---|---|
chr1 | 10342506 | 10342555 | TCAGGTGGGCTTGACGTCTGTGACCAGTATTCAAGAGAGGATCATGTCTA | CCAGGTGTAGACATGATCCTCTCTTGAATACTGGTCACAGACGTCAAGCC | KIF1B | NM_015074.3 | c.1211_1260delinsCCAGGTGTAGACATGATCCTCTCTTGAATACTGGTCACAGACGTCAAGCC | p.I404_L420delinsTRCRHDPLLNTGHRRQA |
源码重点部分记录
VEP输出结果整体记录在 “ modules\Bio\EnsEMBL\VEP\OutputFactory.pm ” 中,如果是处于核查可以基于该模块进行反向溯源;
预测氨基酸变化
代码在 “modules\Bio\EnsEMBL\Variation\TranscriptVariationAllele.pm” Line:6841
2
3
4
5
6
7
8=head2 peptide
Description: Return the amino acid sequence that this allele is predicted to result in
Returntype : string or undef if this allele is not in the CDS or is a frameshift
Exceptions : none
Status : Stable
=cut对输入文件进行解析校验
modules\Bio\EnsEMBL\VEP\Parser.pm1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18=head2 validate_vf
Arg 1 : Bio::EnsEMBL::Variation::BaseVariationFeature
Example : $is_valid = $parser->validate_vf($vf);
Description: Performs various (configurable) checks on a VariationFeature
as produced by the parser:
- creates a variation_name from the location+alleles if none
- checks if chr is in user-specified list if provided
- checks start and end look like numbers
- checks start/end are valid (start <= end + 1) and corresponds to ref allele
- checks chr is in valid list
- checks ref allele vs genome if requested
Returntype : bool
Exceptions : none
Caller : next()
Status : Stable
=cut
- HGVS信息注释
- 获取变异最接近的转录本
modules\Bio\EnsEMBL\VEP\AnnotationType\Transcript.pm1
2
3
4
5
6
7
8
9
10
11
12=head2 get_nearest
Arg 1 : Bio::EnsEMBL::Variation::BaseVariationFeature $vf
Arg 2 : string $type (transcript, gene or symbol)
Example : $nearest_gene = $as->get_nearest($vf, 'symbol');
Description: Gets ID (one of transcript, gene, symbol) of the nearest
transcript to the given variant.
Returntype : string
Exceptions : none
Caller : annotate_InputBuffer()
Status : Stable
=cut
- 获取变异最接近的转录本
对应代码: modules\Bio\EnsEMBL\Variation\TranscriptVariation.pm
sub _hgvs_generic {
my $self = shift;
my $reference = pop;
my $hgvs = shift;
#The rna and mitochondrial modes have not yet been implemented, so return undef in case we get a call to these
return undef if ($reference =~ m/rna|mitochondrial/);
# The HGVS subroutine
my $sub = qq{hgvs_$reference};
# Loop over the TranscriptVariationAllele objects associated with this TranscriptVariation
foreach my $tv_allele (@{ $self->get_all_alternate_TranscriptVariationAlleles }) {
#If an HGVS hash was supplied and the allele exists as key, set the HGVS notation for this allele
if (defined($hgvs)) {
my $notation = $hgvs->{$tv_allele->variation_feature_seq()};
$tv_allele->$sub($notation) if defined $notation;
}
# Else, add the HGVS notation for this allele to the HGVS hash
else {
$hgvs->{$tv_allele->variation_feature_seq()} = $tv_allele->$sub();
}
}
return $hgvs;
}
modules\Bio\EnsEMBL\VEP\VariantRecoder.pm