VCF (variant calling format) file, as specified in 1000 Genome http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
is a common file to handle the variant information from the NGS sequencing data.
many tools have been developed to manipulate, extract information with this format.
some popular ones are:
tabix http://samtools.sourceforge.net/tabix.shtml for fast indexing and extracting certain regions of the whole genome.
VCFtools http://vcftools.sourceforge.net/ for many more manipulations including annotating, merging, concatenating, comparing etc.
bedtools http://code.google.com/p/bedtools/ for genomic region overlapping calculation. etc
I have some experience with the above three tools.
by the way, I have not used the GATK tools http://www.broadinstitute.org/gatk/from the broad Institute yet, I am sure I will have a try sometime later.
after google, I found several more:
variationtoolkit http://code.google.com/p/variationtoolkit/
vcflib https://github.com/ekg/vcflib
varianttools http://varianttools.sourceforge.net/
variantannotation http://www.bioconductor.org/packages/2.12/bioc/html/VariantAnnotation.html
Plinkseq http://atgu.mgh.harvard.edu/plinkseq/
Taser is the one I just came across http://www.zhanxw.com/taser/
It is based on R and can extract variant info by gene names fairly easily
Again, there are so many tools out there. Depending on your needs, choose the right one for you.
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
No comments:
Post a Comment