Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Tuesday, February 18, 2014

hosting bigwig by dropbox for UCSC visualization

First, you need to get an idea of what is a bigwig file: https://genome.ucsc.edu/goldenPath/help/bigWig.html

It is the binary form of wig file and allows UCSC genome browser to fetch only the data in the current window. I usually get wig file by MACS1.4 peak calling ChIP-seq data.
MACS2  now does not have the -w option any more. https://github.com/taoliu/MACS/

see a discussion in the google group:
https://groups.google.com/forum/#!searchin/macs-announcement/bedgraph$20ucsc$20track/macs-announcement/LBhAtmC-Zho/uZuxU8ZaqdEJ

MACS2 only creates a bedgraph file http://genome.ucsc.edu/goldenPath/help/bedgraph.html
and one can convert the bedgraph to bigwig:
https://github.com/taoliu/MACS/wiki/Build-Signal-Track
https://gist.github.com/taoliu/2469050
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

if you have a bam file, you can produce a bedgraph file by bedtools  genomeCoverageBed:
http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html
http://www.biostars.org/p/64495/#64680

you need to add a track line (--trackline option) for UCSC genome browser.
https://groups.google.com/forum/#!searchin/bedtools-discuss/bedtools$20bedgraph$20ucsc$20track/bedtools-discuss/3CibmlqIdWA/PW_bhgWQfVMJ

I followed the instructions https://genome.ucsc.edu/goldenPath/help/bigWig.html

To create a bigWig track from a wiggle file, follow these steps:
  1. Create a wig format file following the directions here. Note that when converting a wig file to a bigWig file, you are limited to one track of data in your input file; you must create a separate wig file for each data track. Note that this is the file that is referred to asinput.wig in step 5 below.
  2. Remove any existing 'track' or 'browser' lines from your wig file so that it contains only data.
  3. Download the wigToBigWig program from the directory of binary utilities.
  4. Use the fetchChromSizes script from the same directory to create the chrom.sizes file for the UCSC database you are working with (e.g. hg19). Note that this is the file that is referred to as chrom.sizes in step 5 below.
  5. Create the bigWig file from your wig file using the wigToBigWig utility like so: wigToBigWig input.wig chrom.sizes myBigWig.bw
    (Note that the wigToBigWig program also accepts a gzipped wig input file.)
  6. Move the newly created bigWig file (myBigWig.bw) to a http, https, or ftp location.
  7. Construct a custom track using a single track line. The most basic version of the track line will look something like this:
    track type=bigWig name="My Big Wig" description="A Graph of Data from My Lab" bigDataUrl=http://myorg.edu/mylab/myBigWig.bw
    Optional values can include:
      autoScale         <on|off>                             # default is on
      alwaysZero        <on|off>                             # default is off
      gridDefault       <on|off>                             # default is off
      maxHeightPixels   <max:default:min>                    # default is 128:128:11
      graphType         <bar|points>                         # default is bar
      viewLimits        <lower:upper>                        # default is range found in data
      viewLimitsMax     <lower:upper>                        # suggested bounds of viewLimits, but not enforced
      yLineMark         <real-value>                         # default is 0.0
      yLineOnOff        <on|off>                             # default is off
      windowingFunction <mean+whiskers|maximum|mean|minimum> # default is maximum, mean+whiskers is recommended
      smoothingWindow   <off|[2-16]>                         # default is off
      transformFunc     <NONE|LOG>                           # default is NONE
    For further information on custom bigWig track settings, see the Track Database Definition Document. For further information on how bigWig settings are used in native Browser tracks, see the Configuring graph-based tracks page.
  8. Paste this custom track line into the text box in the custom track management page.
I've encountered a coordinates problem mentioned here https://groups.google.com/forum/#!topic/macs-announcement/yPSPlKdTOwo when I tried to convert wig to bigwig.

the -clip option in the wigToBigWig program seems to resolve the problem, but it still gives warning messages. The best way is to use bedClip program here first http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

Now, I need to use some public ftp or http to host the resulted bigwig file. I decided to use dropbox as I have used it for a while and I have 15G store space. The UCSC genome browser has problem to accept my link. I found a answer here:
http://bergmanlab.smith.man.ac.uk/?p=1989

"The first problem with the Share Link function is that the URL automatically generated by Dropbox cannot be read by the UCSC Genome Browser. For example, the link generated to the file “test.bed” in my Dropbox folder is “https://www.dropbox.com/s/7sjfbknsqhq6xfw/test.bed”, which gives an “Unrecognized format line 1″ error when pasted into the UCSC Browser.  This can easily be fixed if you just want to load a single custom track  to the UCSC Browser using Dropbox by simply replacing “www.dropbox” in the URL generated by Dropbox with “dl.dropboxusercontent”. In this example, the corrected path to the file would be “https://dl.dropboxusercontent.com/s/7sjfbknsqhq6xfw/test.bed”, which can be loaded by the UCSC Genome Browser automatically."

Finally I had the data visualized in UCSC!
lsd1 binding at the oct4 locus:


Next time, I will get a bedgraph file from the MACS1.4 or MACS2, and use the bdg2bw program https://gist.github.com/taoliu/2469050
" conversion to bedgraph is necessary to reduce the final bw size (up to 70%)"

=========================================
I got an email from dropbox, this was mentioned in the blog http://bergmanlab.smith.man.ac.uk/?p=1989
I guess I need to find another place to host my bigwig files...

Hi Ming,

This email is an automated notification from Dropbox that your Public links have been temporarily suspended for generating excessive traffic. Your Dropbox will continue to function normally with the exception of Public links.

For more information on suspended links, please visit the Help Center. If this is your first suspension, you may remove the suspension by visiting your account page.

Thursday, February 13, 2014

how to get a genome-wide motif bed file

Someone was asking this question on Seqanswers http://seqanswers.com/forums/showthread.php?t=40762&highlight=genome+motif+bed

For motif analysis, the most popular program is MEME http://meme.nbcr.net/meme/
There are a bunch of tools in the suites including some useful ones for ChIP-seq
I also saw RAST http://rsat01.biologie.ens.fr/rsa-tools/index.html and oPOSSUM were mentioned http://opossum.cisreg.ca/oPOSSUM3/

I analyze ChIP-seq data a lot. Usually, one gets a bed file containing the positions of the peaks. However, many motif analysis programs require fasta file as input.

There are many ways to get fasta file based on coordinates.
See:
http://www.biostars.org/p/7481/
and my previous post http://crazyhottommy.blogspot.com/2013/04/batch-converting-coordinates-to.html

Going back to the question, I've found several ways to get a bed file containing the coordinates for the motifs.

1. Homer homepage
http://homer.salk.edu/homer/
at the very bottom, there are several links for human and mouse .The file contains all the known motif coordinates for the whole genome. Files are big (several Gb). one can get the specific motifs occurrences by grep.


2.  from UCSC software http://genome.ucsc.edu/ENCODE/analysisTools.html
ENCODE-motifs  http://compbio.mit.edu/encode-motifs/
at the bottom of the page:

only the human data are available.

3. Motif-map http://motifmap.ics.uci.edu/
click motif search

Several more organisms are supported

search the motif you want ( I use CTCF as an example)


Click save on the bottom right.

Click my motifs on the upper right.

export to bed or other format files.



Thursday, February 6, 2014

several tools for Hi-C data, ChIP-seq and methylation data

See the links below:
http://omictools.com/differential-peak-calling/
http://omictools.com/chip-seq-and-beyond/3c-4c-5c-hi-c/hicup-s1473.html
http://omictools.com/analytical-pipelines8/
http://omictools.com/dmr/

There are too many tools out there for established high-throughput sequencing  based techniques.
1. you need to know what are out there. never try to re-invent the wheel
2. choose the right one to fit your own analysis. (only if none of them does what you want to do, write a script by yourself).  choose one with good documentation and is being actively developed.