Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Thursday, August 29, 2013

A new peak calling software: peakzilla

From Github

Peakzilla identifies sites of enrichment and transcription factor binding sites from transcription factor ChIP-seq and ChIP-exo experiments at hight accuracy and resolution. It is designed to perform equally well for data from any species. All necessary parameters are estimated from the data. Peakzilla is suitable for both single and paired end data from any sequencing platform.

Note that peakzilla is not suited for the identification of broad regions of enrichment (e.g. ChIP-seq for histone marks), we recommand using MACS instead: Zhang et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol (2008) 9(9):R137

web/GUI tools for biological data analysis

If you are not a programmer, you will be happy to look at these websites. Even you can program, I find it is sometimes convenient to use them :)

1. Galaxy
founded in PSU, it's been there for several years, and under active development.

2. GenePattern from Broad Institute
I used it sometime ago for some microarray data analysis  when I did not anything about programming. Now, I use R/Bioconductor

3. Taverna
I got to know it from a blog, never used it, but it looks promising

4. GenomeSpace  it integrates many tools together including the above Galaxy and GenePattern

There are just way too many softwares out there for different kind of data analysis. First, get to know what are out there. Then, choose the right one meet your need for research.

Tuesday, August 27, 2013

Two nature protocols for RNA-seq analysis

One is from Simon Anders, his group wrote the DESeq

Count-based differential expression analysis of RNA sequencing data using R and Bioconductor

The other one 

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

I will go through a complete RNA-seq analysis following the guidelines.
There are other packages make RNA-seq analysis easier:

Lots of things are waiting for me to try out!

How to make a heatmap based on ChIP-seq data by R

update on 04/20/2016.
I noticed that this post is the most frequently visited one. it has been almost 3 years
since I wrote this post. Now, there are various tools for this purpose. See a
post on biostars.

well, I recently just went through the whole process for making a heatmap based on a ChIP-seq data set. If you do not know the technique, google it :)

Often, you have a ChIP-seq data that are mapped to the reference genome ( a bam file). You want to plot the sequence tag intensity around certain features ( transcription start sites, gene body, enhancers, or any other genomic region you defined).

you can make an average plot I will need to re-write this one though, the code format is just too bad (I have to learn how to embed R and python code into the blog) ....and the Y axis is not normalized to counts per million.

you may also want to generate a heatmap with the same data.  see ngsplot for examples  If you do not want to code R by yourself, try it. It has been improved a lot since last time I checked it. I once asked a question in the google group :!topic/ngsplot-discuss/efHQ-P-14XM.

Seqmonk  from Simon Andrews can also plot this kind of figure very easily, I am just not satisfied with the picture quality, and I want more customized control of the picture.

I will just paste my code below, and it is heavily commented, you should be able to follow it fairly easily.
update on 05/05/2015, I put the code in a gist instead:

The second gist:

That's all!

I hope you have learned something after reading it:)
update on 09/17/13
arrange the rows in the heatmap by the coverage from strong to weak

Wednesday, August 21, 2013

install GIMP 2.8 in Ubuntu 12.04

I was trying to make some figures, and google told me GIMP is a very popular software as an alternative to Photoshop.

Ubuntu 12.04 has GIMP 2.6 installed. To update to GIMP 2.8, follow the instructions here:

and install the plugins:

It takes long time to make a good-looking figure.
I will write a post to demonstrate how to make a heatmap by R next time.

Thursday, August 15, 2013

pdf organizer and reference manager

After being in the research field for a while, I found myself lost in a vast amount of pdf research papers.
I need a good pdf organizer.
especially when I start writing a paper, putting the reference  become a headache to me.
That's where Zotero and Mendeley come to help.
A comparison for these two:
Zotero is open source, it  can organize other files like images and videos besides pdf.
Mendeley is not open-source, it has a better support for pdf files
watch youtube videos on how to use it:

I am currently using Mendeley.

Thursday, August 8, 2013

nice demonstration of *args and **kwargs in python function def

see a post here

python for biologist

I just found this website:

it teaches you how to start programming with python for biologists.  I even bought the pdf ebook for $39. I had a look at the book, and I think it is worthy. I went through first 4 chapters really fast, they are really basic but teach biologists in a really friendly way. I am even thinking to hold a bioinformatics training course in the future :)  sounds ambitious, hmm..

Advanced topics can be found in the website:

I've read the practical computing for biologists by Haddock Dunn
it is a very good introductory book. Besides python, it also teaches you how to use linux command lines and some database basics. By the way, I wrote my first regular expression after reading the book.

well, I feel I just could get some practical problems ( text manipulation, Genomic Interval calculation using pybedtools, NGS by HTSeq) done by python though I began to play around with python in 04/24/2012. It takes time, but the investment definitely is rewarding!!

Thursday, August 1, 2013

UCSC genome browser tutorial

see here openhelix

I use UCSC genome browser so much, but I did not realize that it has much more functions I've missed before watching the video.

Related tutorials

This tutorial is a part of the tutorial group UCSC Tutorials. You might find the other tutorials in the group interesting: