the gtf file contains 2244857 lines. I used grep to do it, but it takes very long (~1 hour).
what I used:
zcat Homo_sapines.GRCh37.74gtf.gz | grep -f gene_names.txt -w > my_genes.gtf
I searched on line, and found several posts in stackoverflow to speed up grep:
http://stackoverflow.com/questions/14602963/faster-grep-function-for-big-27gb-files
http://stackoverflow.com/questions/13913014/grepping-a-huge-file-80gb-any-way-to-speed-it-up
http://stackoverflow.com/questions/9066609/fastest-possible-grep
options to speed up:
1) Prefix your grep command with
LC_ALL=C
to use the C locale instead of UTF-8.
2) Use
fgrep
because you're searching for a fixed string, not a regular expressionI then used:
zcat Homo_sapines.GRCh37.74gtf.gz | LC_ALL=C fgrep -f gene_names.txt -w > my_genes.gtf
It runs much faster!
uk replica watches, combining elegant style and cutting-edge technology, a variety of styles of replica breitling watches, the pointer walks between your exclusive taste style.
ReplyDelete