I noticed that this post is the most frequently visited one. it has been almost 3 years
since I wrote this post. Now, there are various tools for this purpose. See a
post on biostars.
well, I recently just went through the whole process for making a heatmap based on a ChIP-seq data set. If you do not know the technique, google it :) http://en.wikipedia.org/wiki/ChIP-sequencing
Often, you have a ChIP-seq data that are mapped to the reference genome ( a bam file). You want to plot the sequence tag intensity around certain features ( transcription start sites, gene body, enhancers, or any other genomic region you defined).
you can make an average plot http://crazyhottommy.blogspot.com/2013/04/how-to-make-tss-plot-using-rna-seq-and.html. I will need to re-write this one though, the code format is just too bad (I have to learn how to embed R and python code into the blog) ....and the Y axis is not normalized to counts per million.
you may also want to generate a heatmap with the same data. see ngsplot for examples https://code.google.com/p/ngsplot/. If you do not want to code R by yourself, try it. It has been improved a lot since last time I checked it. I once asked a question in the google group : https://groups.google.com/forum/#!topic/ngsplot-discuss/efHQ-P-14XM.
Seqmonk http://www.bioinformatics.babraham.ac.uk/projects/seqmonk/ from Simon Andrews can also plot this kind of figure very easily, I am just not satisfied with the picture quality, and I want more customized control of the picture.
I will just paste my code below, and it is heavily commented, you should be able to follow it fairly easily.
update on 05/05/2015, I put the code in a gist instead:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This R script is to generate the TF or histone modification heatmap | |
# at certain genomic features (TSS, enhancers) from the ChIP-seq data | |
# the input matrix is got from Homer software. alternative to R, use cluster3 to cluster, and visualize by # java Treeviewer | |
# generate the matrix by Homer: annotatePeaks.pl peak_file.txt hg19 -size 6000 -hist 10 -ghist -d TF1/ # > outputfile_matrix.txt | |
# see several posts for heatmap: | |
# http://davetang.org/muse/2010/12/06/making-a-heatmap-with-r/ | |
# http://www.r-bloggers.com/r-using-rcolorbrewer-to-colour-your-figures-in-r/ | |
# 08/20/13 by Tommy Tang | |
# it is such a simple script but took me several days to get it work...I mean the desired | |
# color setting of the heatmap. Making a pretty figure takes time :) | |
# use either heatmap.2 from gplots or pheatmap | |
library(gplots) | |
library(pheatmap) | |
# you can compare the heatmaps generated by heatmap.2 and pheatmap | |
setwd("/home/tommy/homer") | |
d1 <- read.table("outputfile_matrix.txt", header=T) | |
# have a look at the matrix | |
head(d1) | |
d1$Gene # the dataframe generated by homer has a column named Gene | |
m1<- as.matrix( d1[,2:ncol(d1)]) # the first column is the TSS id, | |
rownames(m1)<- d1$Gene # heatmap.2 works only on matrix, turn the dataframe to matrix, and # add the TSS id as the row name | |
m1<- log2(m1+1) # log2 transform the raw counts | |
#exam the data range | |
range(m1) | |
hist(m1) # histogram to look at distributions | |
# it is from the ChIP-seq count data, many of them are 0s. I transfromed the raw data | |
# by log2(m+1), so if the log2 value is 0, the raw number is also 0 (count) | |
# to compare different heatmaps, I have to map the value to the color using the | |
# break argument in pheatmap or heatmap.2 That's why I examed the data range. | |
# http://stackoverflow.com/questions/17820143/how-to-change-heatmap-2-color-range-in-r | |
# https://stat.ethz.ch/pipermail/bioconductor/2011-November/041866.html | |
# http://seqanswers.com/forums/showthread.php?p=114275&posted=1#post114275 | |
bk = unique(c(seq(-0.1,3, length=100),seq(3,9.7,length=100))) | |
hmcols<- colorRampPalette(c("white","red"))(length(bk)-1) | |
# you can play around with the break points, 9.7 is the max of the matrix m1 | |
# if just use bk = c(seq(-0.1,1, length=100),seq(1,12.5,length=100)) without the unique function | |
# the pheatmap gave me error message: | |
# Error in cut.default(x, breaks = breaks, include.lowest = T) : | |
# 'breaks' are not unique | |
# It is caused by concatenating several seq() results together, which share the same boundaries. | |
# try ?dist ?hclust | |
png("test.png", width=300, height = 800) # width and height are in pixel | |
heatmap.2(m1, col=hmcols, breaks = bk, Rowv= TRUE , Colv=FALSE, dendrogram="row", useRaster = TRUE, symkey=FALSE, symm=F, symbreaks=T, scale="none", trace="none", labRow=NA, labCol=NA) | |
# do not show the row and column labels | |
# trace argument should be put to "none", otherwise the trace is cyan and it will "eat" the heatmap | |
# I asked in Biostar http://www.biostars.org/p/79444/#79457 | |
# look at the documentation for #heatmap.2 http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/gplots/html/heatmap.2.html | |
# or you can use pheatmap | |
pheatmap(m1, color=hmcols, breaks= bk, cluster_rows=TRUE, cluster_cols=FALSE, legend=FALSE, show_rownames=FALSE, show_colnames=FALSE) | |
dev.off() |
The second gist:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#The X axis is -3 kb to 3 kb around TSS. | |
#It turns out that the heatmap.2 function use default hclust ( Hierachical Clustering) to cluster the #matrix. | |
#alternatively, we can use K-means clustering to cluster the data and to see what's the pattern look like. | |
km<- kmeans(m1,2) # determine how many cluster you want, I specify 2 here | |
m1.kmeans<- cbind(m1, km$cluster) # combine the cluster with the matrix | |
dim(m1.kmeans) | |
# [1] 903 602 | |
# the last column is 602 | |
o<- order(m1.kmeans[,602]) # order the last column | |
m1.keans<- m1.kmeans[o,] # order the matrix according to the order of the last column | |
pheatmap( m1.kmeans[,1:601], cluster_rows = F, cluster_cols = F, col= hmcols, breaks = bk, legend=FALSE, show_rownames=FALSE, show_colnames=FALSE) | |
# no need to cluster by pheatmap, so I set cluster_rows and cluster_cols to FALSE |
That's all!
===============================
update on 09/17/13
arrange the rows in the heatmap by the coverage from strong to weak
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
m.row.sum<- cbind(m1, rowSums(m1)) | |
o1<- rev(order(m.row.sum[,602])) | |
m.row.sum<- m.row.sum[o1,] | |
bk = unique(c(seq(-0.1,3, length=100),seq(3,10.35,length=100))) | |
hmcols<- colorRampPalette(c("white","red"))(length(bk)-1) | |
pheatmap( m.row.sum[,1:601], cluster_rows = F, cluster_cols = F, col= hmcols, breaks = bk, legend=FALSE, show_rownames=FALSE, show_colnames=FALSE) |
I asked Simon Anders in Seqanwser how the Seqmonk software generates similar figures. It turns out that no special algorithm is used in the clustering. Basically, the rows in the heatmap are arranged by the coverage from strong to weak.
ReplyDeleteAnd it looks much better.
Nice plot. Can I make it using TSS of all genes of human (bed file) and ChIP-seq peak (bed files)?
ReplyDeleteNo, you can only use ChIP-seq bam files or wig files.
DeleteGreat tutorial. How do I add colory key for the heatmap?
ReplyDeleteif you use heatmap.2, it will generate a color key. see this post for more tools and easier way to do it https://www.biostars.org/p/180314/
DeleteI had a lit of genes, which are significantly DE from RANseq and I also have same sample H3K9AC14 ChIPseq data. I would like to plot the group of genes from RNAseq promoter region binding profile based on the ChIPseq data. Do you know how to do it. My current situation is I extract the list from RNAseq and I know how to plot the TSS plot. But I could not extract the list of genes' chrom information from mm10 db for TSS plot.
ReplyDeleteIt seems you need some help to format the data. If you have the gene list and have the mm10 annotation, you can extract all the TSSs with 5kb window easily. Ask a local bioinformatic people for help. Or see a post here for more solutions https://www.biostars.org/p/180314/
DeleteHi Tommy,
ReplyDeleteHow can I plot this over an entire genebody, using rat genome.
Thank you
please refer to this post https://www.biostars.org/p/180314/ ngsplot can do it, Genomation can do it as well
Deletelets say the peakfile had 70000 rows , should i need to filter down a bit all plot all?
ReplyDeletefake raybans erika sunglasses
ReplyDeletefake raybans havana collection sunglasses
fake raybans new wayfarer sunglasses
fake raybans online exclusives sunglasses
fake raybans rectangular sunglasses
All thanks to Mr Anderson for helping with my profits and making my fifth withdrawal possible. I'm here to share an amazing life changing opportunity with you. its called Bitcoin / Forex trading options. it is a highly lucrative business which can earn you as much as $2,570 in a week from an initial investment of just $200. I am living proof of this great business opportunity. If anyone is interested in trading on bitcoin or any cryptocurrency and want a successful trade without losing notify Mr Anderson now.Whatsapp: (+447883246472 )
ReplyDeleteEmail: tdameritrade077@gmail.com