Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Monday, May 4, 2015

IBash Notebook for reproducible research

I use command line a lot. It is awesome for data processing, text data formatting and even exploratory data analysis.

Last week, one of my colleagues complained that she forgot how she got the data and processed the data. With a future "ME" in mind, one needs to do extensive documentations of where, when and how you download and process the data, and document the versions of the tools used in the analysis.

Although there are many ways to make tasks on command line reproducible such as using  Drake and GNU make, it is still not as straightforward as using Ipython Notebook for python and R markdown files for R, respectively.

Luckily, I got to know from Jeroen Janssens, who wrote the "Data science at command line" Book, that there is a bash_kernal for Ipython notebook, and I gave it a try.
see a screenshot of the notebook:

see the whole notebook here: http://nbviewer.ipython.org/gist/crazyhottommy/71e0dcb6d678c137733c#

Essentially, I copied the .ipynb file (it is a JSON file) and pasted it to a gist, and insert the gist link to the nbviewer website http://nbviewer.ipython.org/

With IBash Notebook, one can document the linux commands in a real-time manner and make his research more reproducible!

No comments:

Post a Comment