Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Friday, September 26, 2014

everyone will write a blog post on dplyr

dply is a nice R package to manipulate big data. There are several good tutorials I came across:

http://www.dataschool.io/dplyr-tutorial-for-faster-data-manipulation-in-r/
http://stat545-ubc.github.io/block009_dplyr-intro.html
http://stat545-ubc.github.io/bit001_dplyr-cheatsheet.html
http://gettinggeneticsdone.blogspot.com/2014/08/do-your-data-janitor-work-like-boss.html

I followed the example in the early edition of Bioinformatics Data Skills on the dplyr part and put a gist below. The example used dplyr to manipulate and summarize the human annotation gff file downloaded here http://useast.ensembl.org/info/data/ftp/index.html.

I downloaded the ensemble 74 build which is different from the 75 build in the example of the book.
I believe the gencode v19 annotation file is based on ensemble 74.

I also included comments in the R code describing how to do the same job with linux command line.

1 comment: