Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Tuesday, August 27, 2013

How to make a heatmap based on ChIP-seq data by R

update on 04/20/2016.
I noticed that this post is the most frequently visited one. it has been almost 3 years
since I wrote this post. Now, there are various tools for this purpose. See a
post on biostars.

well, I recently just went through the whole process for making a heatmap based on a ChIP-seq data set. If you do not know the technique, google it :)  http://en.wikipedia.org/wiki/ChIP-sequencing

Often, you have a ChIP-seq data that are mapped to the reference genome ( a bam file). You want to plot the sequence tag intensity around certain features ( transcription start sites, gene body, enhancers, or any other genomic region you defined).

you can make an average plot http://crazyhottommy.blogspot.com/2013/04/how-to-make-tss-plot-using-rna-seq-and.html. I will need to re-write this one though, the code format is just too bad (I have to learn how to embed R and python code into the blog) ....and the Y axis is not normalized to counts per million.

you may also want to generate a heatmap with the same data.  see ngsplot for examples https://code.google.com/p/ngsplot/.  If you do not want to code R by yourself, try it. It has been improved a lot since last time I checked it. I once asked a question in the google group : https://groups.google.com/forum/#!topic/ngsplot-discuss/efHQ-P-14XM.

Seqmonk http://www.bioinformatics.babraham.ac.uk/projects/seqmonk/  from Simon Andrews can also plot this kind of figure very easily, I am just not satisfied with the picture quality, and I want more customized control of the picture.


I will just paste my code below, and it is heavily commented, you should be able to follow it fairly easily.
update on 05/05/2015, I put the code in a gist instead:



The second gist:






That's all!

I hope you have learned something after reading it:)
===============================
update on 09/17/13
arrange the rows in the heatmap by the coverage from strong to weak






9 comments:

  1. I asked Simon Anders in Seqanwser how the Seqmonk software generates similar figures. It turns out that no special algorithm is used in the clustering. Basically, the rows in the heatmap are arranged by the coverage from strong to weak.
    And it looks much better.

    ReplyDelete
  2. Nice plot. Can I make it using TSS of all genes of human (bed file) and ChIP-seq peak (bed files)?

    ReplyDelete
    Replies
    1. No, you can only use ChIP-seq bam files or wig files.

      Delete
  3. Great tutorial. How do I add colory key for the heatmap?

    ReplyDelete
    Replies
    1. if you use heatmap.2, it will generate a color key. see this post for more tools and easier way to do it https://www.biostars.org/p/180314/

      Delete
  4. I had a lit of genes, which are significantly DE from RANseq and I also have same sample H3K9AC14 ChIPseq data. I would like to plot the group of genes from RNAseq promoter region binding profile based on the ChIPseq data. Do you know how to do it. My current situation is I extract the list from RNAseq and I know how to plot the TSS plot. But I could not extract the list of genes' chrom information from mm10 db for TSS plot.

    ReplyDelete
    Replies
    1. It seems you need some help to format the data. If you have the gene list and have the mm10 annotation, you can extract all the TSSs with 5kb window easily. Ask a local bioinformatic people for help. Or see a post here for more solutions https://www.biostars.org/p/180314/

      Delete
  5. Hi Tommy,

    How can I plot this over an entire genebody, using rat genome.

    Thank you

    ReplyDelete
    Replies
    1. please refer to this post https://www.biostars.org/p/180314/ ngsplot can do it, Genomation can do it as well

      Delete