Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Thursday, February 13, 2014

how to get a genome-wide motif bed file

Someone was asking this question on Seqanswers http://seqanswers.com/forums/showthread.php?t=40762&highlight=genome+motif+bed

For motif analysis, the most popular program is MEME http://meme.nbcr.net/meme/
There are a bunch of tools in the suites including some useful ones for ChIP-seq
I also saw RAST http://rsat01.biologie.ens.fr/rsa-tools/index.html and oPOSSUM were mentioned http://opossum.cisreg.ca/oPOSSUM3/

I analyze ChIP-seq data a lot. Usually, one gets a bed file containing the positions of the peaks. However, many motif analysis programs require fasta file as input.

There are many ways to get fasta file based on coordinates.
See:
http://www.biostars.org/p/7481/
and my previous post http://crazyhottommy.blogspot.com/2013/04/batch-converting-coordinates-to.html

Going back to the question, I've found several ways to get a bed file containing the coordinates for the motifs.

1. Homer homepage
http://homer.salk.edu/homer/
at the very bottom, there are several links for human and mouse .The file contains all the known motif coordinates for the whole genome. Files are big (several Gb). one can get the specific motifs occurrences by grep.


2.  from UCSC software http://genome.ucsc.edu/ENCODE/analysisTools.html
ENCODE-motifs  http://compbio.mit.edu/encode-motifs/
at the bottom of the page:

only the human data are available.

3. Motif-map http://motifmap.ics.uci.edu/
click motif search

Several more organisms are supported

search the motif you want ( I use CTCF as an example)


Click save on the bottom right.

Click my motifs on the upper right.

export to bed or other format files.



No comments:

Post a Comment