Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Saturday, June 29, 2013

VCF file annotation and manipulation tools

VCF (variant calling format) file, as specified in 1000 Genome http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
is a common file to handle the variant information from the NGS sequencing data.

many tools have been developed to manipulate, extract information with this format.
some popular ones are:
tabix http://samtools.sourceforge.net/tabix.shtml for fast indexing and extracting certain regions of the whole genome.
VCFtools http://vcftools.sourceforge.net/ for many more manipulations including annotating, merging, concatenating, comparing etc.
bedtools http://code.google.com/p/bedtools/ for genomic region overlapping calculation. etc
I have some experience with the above three tools.

by the way, I have not used the GATK tools http://www.broadinstitute.org/gatk/from the broad Institute yet, I am sure I will have a try sometime later.

after google, I found several more:
variationtoolkit http://code.google.com/p/variationtoolkit/
vcflib https://github.com/ekg/vcflib
varianttools http://varianttools.sourceforge.net/
variantannotation http://www.bioconductor.org/packages/2.12/bioc/html/VariantAnnotation.html
Plinkseq http://atgu.mgh.harvard.edu/plinkseq/

Taser is the one I just came across http://www.zhanxw.com/taser/
It is based on R and can extract variant info by gene names fairly easily


Again, there are so many tools out there. Depending on your needs, choose the right one for you.

No comments:

Post a Comment