Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Wednesday, June 28, 2017

install VEP for annotating variants

VEP is one of the most commonly used variants annotation tools along with annovar, snpEff, but the installation and config can be very intimidating.

I just went through an installation process, and put down a gist:


Install

The latest version of vep is on github http://www.ensembl.org/info/docs/tools/vep/script/vep_download.html#installer

it is version 89 when this gist was written.(bioinformatics tools evolve too fast!)

check this gist as well https://gist.github.com/ckandoth/f265ea7c59a880e28b1e533a6e935697

cd /scratch/genomic_med/apps
git clone https://github.com/Ensembl/ensembl-vep.git
cd ensembl-vep
git status
# the Ensembl API will be installed
perl INSTALL.pl
export VEP_DATA="/scratch/genomic_med/apps/ensembl-vep-data"
export VEP_PATH="/scratch/genomic_med/apps/ensembl-vep"

rsync -avhP rsync://ftp.ensembl.org/ensembl/pub/release-89/variation/VEP/homo_sapiens_vep_89_GRCh37.tar.gz $VEP_DATA
tar -xvzf $VEP_DATA/homo_sapiens_vep_89_GRCh37.tar.gz -C $VEP_DATA

install the reference FASTAs for GRCh37:

a fasta file $VEP_DATA/homo_sapiens/89_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz will be downloaded.

perl INSTALL.pl --AUTO f --SPECIES homo_sapiens --ASSEMBLY GRCh37 --DESTDIR $VEP_PATH --CACHEDIR $VEP_DATA

Convert the offline cache for use with tabix, that significantly speeds up the lookup of known variants:

perl convert_cache.pl --species homo_sapiens --version 89_GRCh37 --dir $VEP_DATA

Annotate

vep --species homo_sapiens --assembly GRCh37 --offline --no_stats --sift b --ccds --uniprot --hgvs --symbol --numbers --domains --gene_phenotype --canonical --protein --biotype --uniprot --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing --total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length --dir $VEP_DATA --fasta $VEP_DATA/homo_sapiens/89_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz --input_file example_GRCh37.vcf --output_file example_GRCh37.vep.vcf --polyphen b --af_1kg --af_esp --regulatory 

If you got an error message:

-------------------- EXCEPTION --------------------
MSG: ERROR: Cannot index bgzipped FASTA file with Bio::DB::Fasta

most likely, you perl version is too old. see an issue here

options:

  1. You can update your systems perl the suggested version (which was not suitable in my case)
  2. Or one can install a local version of the correct perl version. see here
  3. As the error is caused by reading a gziped file, one can simply unzip the reference.

Make sure you have write access to the folder where the fasta file resides. I am placing the fasta in our department shared folder.

from http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html

The first time you run the script with this parameter an index will be built which can take a few minutes. This is required if fetching HGVS annotations (--hgvs) or checking reference sequences (--check_ref) in offline mode (--offline).

view raw install_VEP.md hosted with ❤ by GitHub

No comments:

Post a Comment