vcfCodingSnps v1.5

Some possible annotating results for a single SNP with the meanings of their output format are listed below:

Annotation output Interpretation of the output
5'UTR=A26C2[-] the SNP is in the 5'UTR region of gene A26C2 with a minus strand.
INTRONIC=POTEG[-] the SNP is in the intronic region of gene POTEG with a minus strand.
SYNONYMOUS_CODING=BARD1(uc002veu.2):His506His[-] the SNP is synonymous coding at the 506th codon in gene BARD1 with a minus strand and it keeps amino-acid His unchanged.
NON_SYNONYMOUS_CODING=BARD1(uc002veu.2):Arg658Cys[-] the SNP is non_synonymous coding at the 658th codon in gene BARD1 (ucsc gene name uc002veu.2)with a minus strand and it changes amino-acid from Arg to Cys.
SPLICE_SITE=FARP2(uc002wbi.1)[+] the SNP is in the SPLICE_SITE (5 bp within exon start or end positions in the coding region) of gene FARP2 (ucsc gene name uc002wbi.1) with a plus strand.
STOP_GAINED=C2orf83(uc002vph.1):Trp141stop[-] the SNP is the 141th codon in gene MAPK12 (ucsc gene name uc002vph.1) with a minus strand and it changes amino-acid Trp to a stop codon.
STOP_LOST=OR2M3(uc001ieb.1):stop313Arg[+] the SNP is the 313th codon in gene OR2M3 (ucsc gene name uc001ieb.1) with a plus strand and it changes a stop codon to amino-acid Arg.

The annotating result will be added to the entry "INFO" of the input VCF SNP file and outputted together with other information. If a SNP is annotated differently with respect to different genes (or different isoforms of the same gene), all the annotated results will be added into the entry "INFO", delimited by ";". If the SNP is NOT in any gene coding region, then the original "INFO" will be outputted. Here is an example of input and output VCF file headlines:

Input VCF headlines:

  ##format=VCFv3.2
  ##NA12891=../GLF/NA12891.chrom8.SLX.SRP000032.2009_07.glf
  ##NA12892=../GLF/NA12892.chrom8.SLX.SRP000032.2009_07.glf
  ##NA12878=../merged/NA12878.chrom8.merged.glf
  ##minTotalDepth=0
  ##maxTotalDepth=1000
  ##minMapQuality=40
  ##minPosterior=0.9990
  ##program=glfTrio
  ##versionDate=Thu Aug 27 18:23:18 2009
  #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NA12891 NA12892 NA12878

  8   146284   .   c    a   54   .   depth=29;duples=hets;mac=2;tdt=0/2   GT:GQ:GD   1/0:31:12   1/0:32:3   0/0:28:14

  8   146703   .   c    t   92   .   depth=41;mac=1;tdt=0/1   GT:GQ:GD   1/1:42:14   0/1:54:9   1/1:24:18

  8   151532   .   t    c   100   .   depth=131   GT:GQ:GD    0/0:8:37   1/0:100:26   1/0:100:68

  8   151573   .   g    t   72   .   depth=113;mac=1;tdt=1/1 GT:GQ:GD   0/1:48:35   0/0:39:26   0/1:100:52

  8   151638   .   a    c   100   .   depth=124;duples=hets;mac=2;tdt=1/2   GT:GQ:GD   0/1:100:55   0/1:100:58   0/1:87:11

  8   151651   .   c    g   100   .   depth=124;duples=hets;mac=2;tdt=1/2   GT:GQ:GD   0/1:87:56   0/1:100:56   0/1:24:12

  8   151763   .   t    a   100   .   depth=127;duples=hets;mac=2;tdt=1/2   GT:GQ:GD   1/0:100:49   1/0:100:54   1/0:100:24

  8   151936   .   a    g   32    .   depth=105;duples=hets;mac=2;tdt=0/2   GT:GQ:GD   0/1:42:44   0/1:23:47   0/0:39:14

  8   152578   .   c    t   87    .   depth=108   GT:GQ:GD   1/1:95:31   1/1:89:30   1/1:100:47
  ......

Output VCF headlines:

  ##format=VCFv3.2 
  ##NA12891=../GLF/NA12891.chrom8.SLX.SRP000032.2009_07.glf 
  ##NA12892=../GLF/NA12892.chrom8.SLX.SRP000032.2009_07.glf 
  ##NA12878=../merged/NA12878.chrom8.merged.glf 
  ##minTotalDepth=0 
  ##maxTotalDepth=1000 
  ##minMapQuality=40 
  ##minPosterior=0.9990 
  ##program=glfTrio 
  ##versionDate=Thu Aug 27 18:23:18 2009 
  #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12891 NA12892 NA12878

  8 146284 . c a 54 . depth=29;duples=hets;mac=2;tdt=0/2 GT:GQ:GD 1/0:31:12 1/0:32:3 0/0:28:14 
  
  8 146703 . c t 92 . depth=41;mac=1;tdt=0/1 GT:GQ:GD 1/1:42:14 0/1:54:9 1/1:24:18 
  
  8 151532 . t c 100 . depth=131;5'UTR=RPL23A_20_869(uc010lra.1)[-];5'UTR=RPL23A_20_869(uc003woq.2)[-]
   ;5'UTR=RPL23A_20_869(uc010lrb.1)[-] GT:GQ:GD 0/0:8:37 1/0:100:26 1/0:100:68 
  
  8 151573 . g t 72 . depth=113;mac=1;tdt=1/1;5'UTR=RPL23A_20_869(uc010lra.1)[-];5'UTR=RPL23A_20_869(uc003woq.2)[-];
    5'UTR=RPL23A_20_869(uc010lrb.1)[-] GT:GQ:GD 0/1:48:35 0/0:39:26 0/1:100:52 
  
  8 151638 . a c 100 . depth=124;duples=hets;mac=2;tdt=1/2;5'UTR=RPL23A_20_869(uc010lra.1)[-];5'UTR=RPL23A_20_869
   (uc003woq.2)[-];5'UTR=RPL23A_20_869(uc010lrb.1)[-] GT:GQ:GD 0/1:100:55 0/1:100:58 0/1:87:11 
  
  8 151651 . c g 100 . depth=124;duples=hets;mac=2;tdt=1/2;5'UTR=RPL23A_20_869(uc010lra.1)[-];5'UTR=RPL23A_20_869
    (uc003woq.2)[-];5'UTR=RPL23A_20_869(uc010lrb.1)[-] GT:GQ:GD 0/1:87:56 0/1:100:56 0/1:24:12 
  
  8 151763 . t a 100 . depth=127;duples=hets;mac=2;tdt=1/2;5'UTR=RPL23A_20_869(uc010lra.1)[-];5'UTR=RPL23A_20_869
    (uc003woq.2)[-];5'UTR=RPL23A_20_869(uc010lrb.1)[-] GT:GQ:GD 1/0:100:49 1/0:100:54 1/0:100:24 
  
  8 151936 . a g 32 . depth=105;duples=hets;mac=2;tdt=0/2;5'UTR=RPL23A_20_869(uc010lra.1)[-];5'UTR=RPL23A_20_869
    (uc003woq.2)[-];5'UTR=RPL23A_20_869(uc010lrb.1)[-] GT:GQ:GD 0/1:42:44 0/1:23:47 0/0:39:14 
  
  8 152578 . c t 87 . depth=108;5'UTR=RPL23A_20_869(uc010lra.1)[-];5'UTR=RPL23A_20_869(uc003woq.2)[-];
    5'UTR=RPL23A_20_869(uc010lrb.1)[-] GT:GQ:GD 1/1:95:31 1/1:89:30 1/1:100:47
  ......

Output log file will give out more detailed information about each annotated SNP. Here is an example output log file headlines:

##chr     pos     ref     alt     ucsc_name      genestrend      genestart     geneend ref_codon      ref_AA  alt_codon      alt_AA
codon_start codon_end genesymbol codonCount type
chr2 214811129 T c uc010fuz.1 + 213857360 214814327 CTA Leu CCA Pro 214811128 214811130 SPAG16 433 NON_SYNONYMOUS_CODING
chr2 214811129 T c uc002veq.1 + 213857360 214983470 . . . . . . SPAG16 . INTRONIC
chr2 214811129 T c uc002ver.1 + 213857360 214983470 . . . . . . SPAG16 . INTRONIC
chr2 214811174 T a uc010fuz.1 + 213857360 214814327 . . . . . . SPAG16 . 3'UTR
chr2 214811174 T a uc002veq.1 + 213857360 214983470 . . . . . . SPAG16 . INTRONIC