I've been collecting potential packages for (interactive) visualization of Genomic data.
Here are the ones so far I've collected:
python:
Vispy http://vispy.org/gallery.html
pretty plotlib https://github.com/olgabot/prettyplotlib
D3py https://github.com/mikedewar/d3py
Yhat ggplot for python http://blog.yhathq.com/posts/ggplot-for-python.html
galry https://github.com/rossant/galry
Glue http://www.glueviz.org/en/latest/
Seaborn https://github.com/mwaskom/seaborn
Bokeh https://github.com/ContinuumIO/Bokeh
R:
Click me https://github.com/nachocab/clickme
plotly https://plot.ly/
others:
http://selection.datavisualization.ch/
Gpviz http://icbi.at/software/gpviz/gpviz.shtml
Jheatmap http://jheatmap.github.io/jheatmap/
Caleydo http://caleydo.github.io/caleydo-doc/3.1/index.html#!index.md
There are too many out there, I will need to focus on one or two.
A wet-dry hybrid biologist's take on genetics and genomics. Mostly is about Linux, R, python, reproducible research, open science and NGS. Grab my book to transform yourself to a computational biologist https://divingintogeneticsandgenomics.ck.page/
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Monday, February 24, 2014
Tuesday, February 18, 2014
hosting bigwig by dropbox for UCSC visualization
First, you need to get an idea of what is a bigwig file: https://genome.ucsc.edu/goldenPath/help/bigWig.html
It is the binary form of wig file and allows UCSC genome browser to fetch only the data in the current window. I usually get wig file by MACS1.4 peak calling ChIP-seq data.
MACS2 now does not have the -w option any more. https://github.com/taoliu/MACS/
see a discussion in the google group:
https://groups.google.com/forum/#!searchin/macs-announcement/bedgraph$20ucsc$20track/macs-announcement/LBhAtmC-Zho/uZuxU8ZaqdEJ
MACS2 only creates a bedgraph file http://genome.ucsc.edu/goldenPath/help/bedgraph.html
and one can convert the bedgraph to bigwig:
https://github.com/taoliu/MACS/wiki/Build-Signal-Track
https://gist.github.com/taoliu/2469050
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
if you have a bam file, you can produce a bedgraph file by bedtools genomeCoverageBed:
http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html
http://www.biostars.org/p/64495/#64680
you need to add a track line (--trackline option) for UCSC genome browser.
https://groups.google.com/forum/#!searchin/bedtools-discuss/bedtools$20bedgraph$20ucsc$20track/bedtools-discuss/3CibmlqIdWA/PW_bhgWQfVMJ
I followed the instructions https://genome.ucsc.edu/goldenPath/help/bigWig.html
the -clip option in the wigToBigWig program seems to resolve the problem, but it still gives warning messages. The best way is to use bedClip program here first http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
Now, I need to use some public ftp or http to host the resulted bigwig file. I decided to use dropbox as I have used it for a while and I have 15G store space. The UCSC genome browser has problem to accept my link. I found a answer here:
http://bergmanlab.smith.man.ac.uk/?p=1989
"The first problem with the Share Link function is that the URL automatically generated by Dropbox cannot be read by the UCSC Genome Browser. For example, the link generated to the file “test.bed” in my Dropbox folder is “https://www.dropbox.com/s/7sjfbknsqhq6xfw/test.bed”, which gives an “Unrecognized format line 1″ error when pasted into the UCSC Browser. This can easily be fixed if you just want to load a single custom track to the UCSC Browser using Dropbox by simply replacing “www.dropbox” in the URL generated by Dropbox with “dl.dropboxusercontent”. In this example, the corrected path to the file would be “https://dl.dropboxusercontent.com/s/7sjfbknsqhq6xfw/test.bed”, which can be loaded by the UCSC Genome Browser automatically."
Finally I had the data visualized in UCSC!
lsd1 binding at the oct4 locus:
Next time, I will get a bedgraph file from the MACS1.4 or MACS2, and use the bdg2bw program https://gist.github.com/taoliu/2469050
" conversion to bedgraph is necessary to reduce the final bw size (up to 70%)"
=========================================
I got an email from dropbox, this was mentioned in the blog http://bergmanlab.smith.man.ac.uk/?p=1989
I guess I need to find another place to host my bigwig files...
It is the binary form of wig file and allows UCSC genome browser to fetch only the data in the current window. I usually get wig file by MACS1.4 peak calling ChIP-seq data.
MACS2 now does not have the -w option any more. https://github.com/taoliu/MACS/
see a discussion in the google group:
https://groups.google.com/forum/#!searchin/macs-announcement/bedgraph$20ucsc$20track/macs-announcement/LBhAtmC-Zho/uZuxU8ZaqdEJ
MACS2 only creates a bedgraph file http://genome.ucsc.edu/goldenPath/help/bedgraph.html
and one can convert the bedgraph to bigwig:
https://github.com/taoliu/MACS/wiki/Build-Signal-Track
https://gist.github.com/taoliu/2469050
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
if you have a bam file, you can produce a bedgraph file by bedtools genomeCoverageBed:
http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html
http://www.biostars.org/p/64495/#64680
you need to add a track line (--trackline option) for UCSC genome browser.
https://groups.google.com/forum/#!searchin/bedtools-discuss/bedtools$20bedgraph$20ucsc$20track/bedtools-discuss/3CibmlqIdWA/PW_bhgWQfVMJ
I followed the instructions https://genome.ucsc.edu/goldenPath/help/bigWig.html
To create a bigWig track from a wiggle file, follow these steps:
- Create a wig format file following the directions here. Note that when converting a wig file to a bigWig file, you are limited to one track of data in your input file; you must create a separate wig file for each data track. Note that this is the file that is referred to asinput.wig in step 5 below.
- Remove any existing 'track' or 'browser' lines from your wig file so that it contains only data.
- Download the wigToBigWig program from the directory of binary utilities.
- Use the fetchChromSizes script from the same directory to create the chrom.sizes file for the UCSC database you are working with (e.g. hg19). Note that this is the file that is referred to as chrom.sizes in step 5 below.
- Create the bigWig file from your wig file using the wigToBigWig utility like so: wigToBigWig input.wig chrom.sizes myBigWig.bw
(Note that the wigToBigWig program also accepts a gzipped wig input file.) - Move the newly created bigWig file (myBigWig.bw) to a http, https, or ftp location.
- Construct a custom track using a single track line. The most basic version of the track line will look something like this:
track type=bigWig name="My Big Wig" description="A Graph of Data from My Lab" bigDataUrl=http://myorg.edu/mylab/myBigWig.bwautoScale <on|off> # default is on alwaysZero <on|off> # default is off gridDefault <on|off> # default is off maxHeightPixels <max:default:min> # default is 128:128:11 graphType <bar|points> # default is bar viewLimits <lower:upper> # default is range found in data viewLimitsMax <lower:upper> # suggested bounds of viewLimits, but not enforced yLineMark <real-value> # default is 0.0 yLineOnOff <on|off> # default is off windowingFunction <mean+whiskers|maximum|mean|minimum> # default is maximum, mean+whiskers is recommended smoothingWindow <off|[2-16]> # default is off transformFunc <NONE|LOG> # default is NONE
For further information on custom bigWig track settings, see the Track Database Definition Document. For further information on how bigWig settings are used in native Browser tracks, see the Configuring graph-based tracks page. - Paste this custom track line into the text box in the custom track management page.
the -clip option in the wigToBigWig program seems to resolve the problem, but it still gives warning messages. The best way is to use bedClip program here first http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
Now, I need to use some public ftp or http to host the resulted bigwig file. I decided to use dropbox as I have used it for a while and I have 15G store space. The UCSC genome browser has problem to accept my link. I found a answer here:
http://bergmanlab.smith.man.ac.uk/?p=1989
"The first problem with the Share Link function is that the URL automatically generated by Dropbox cannot be read by the UCSC Genome Browser. For example, the link generated to the file “test.bed” in my Dropbox folder is “https://www.dropbox.com/s/7sjfbknsqhq6xfw/test.bed”, which gives an “Unrecognized format line 1″ error when pasted into the UCSC Browser. This can easily be fixed if you just want to load a single custom track to the UCSC Browser using Dropbox by simply replacing “www.dropbox” in the URL generated by Dropbox with “dl.dropboxusercontent”. In this example, the corrected path to the file would be “https://dl.dropboxusercontent.com/s/7sjfbknsqhq6xfw/test.bed”, which can be loaded by the UCSC Genome Browser automatically."
Finally I had the data visualized in UCSC!
lsd1 binding at the oct4 locus:
Next time, I will get a bedgraph file from the MACS1.4 or MACS2, and use the bdg2bw program https://gist.github.com/taoliu/2469050
" conversion to bedgraph is necessary to reduce the final bw size (up to 70%)"
=========================================
I got an email from dropbox, this was mentioned in the blog http://bergmanlab.smith.man.ac.uk/?p=1989
I guess I need to find another place to host my bigwig files...
Hi Ming, This email is an automated notification from Dropbox that your Public links have been temporarily suspended for generating excessive traffic. Your Dropbox will continue to function normally with the exception of Public links. For more information on suspended links, please visit the Help Center. If this is your first suspension, you may remove the suspension by visiting your account page. |
Thursday, February 13, 2014
how to get a genome-wide motif bed file
Someone was asking this question on Seqanswers http://seqanswers.com/forums/showthread.php?t=40762&highlight=genome+motif+bed
For motif analysis, the most popular program is MEME http://meme.nbcr.net/meme/
There are a bunch of tools in the suites including some useful ones for ChIP-seq
I also saw RAST http://rsat01.biologie.ens.fr/rsa-tools/index.html and oPOSSUM were mentioned http://opossum.cisreg.ca/oPOSSUM3/
I analyze ChIP-seq data a lot. Usually, one gets a bed file containing the positions of the peaks. However, many motif analysis programs require fasta file as input.
There are many ways to get fasta file based on coordinates.
See:
http://www.biostars.org/p/7481/
and my previous post http://crazyhottommy.blogspot.com/2013/04/batch-converting-coordinates-to.html
Going back to the question, I've found several ways to get a bed file containing the coordinates for the motifs.
1. Homer homepage
http://homer.salk.edu/homer/
at the very bottom, there are several links for human and mouse .The file contains all the known motif coordinates for the whole genome. Files are big (several Gb). one can get the specific motifs occurrences by grep.
2. from UCSC software http://genome.ucsc.edu/ENCODE/analysisTools.html
ENCODE-motifs http://compbio.mit.edu/encode-motifs/
at the bottom of the page:
only the human data are available.
3. Motif-map http://motifmap.ics.uci.edu/
click motif search
Several more organisms are supported
search the motif you want ( I use CTCF as an example)
Click save on the bottom right.
Click my motifs on the upper right.
export to bed or other format files.
For motif analysis, the most popular program is MEME http://meme.nbcr.net/meme/
There are a bunch of tools in the suites including some useful ones for ChIP-seq
I also saw RAST http://rsat01.biologie.ens.fr/rsa-tools/index.html and oPOSSUM were mentioned http://opossum.cisreg.ca/oPOSSUM3/
I analyze ChIP-seq data a lot. Usually, one gets a bed file containing the positions of the peaks. However, many motif analysis programs require fasta file as input.
There are many ways to get fasta file based on coordinates.
See:
http://www.biostars.org/p/7481/
and my previous post http://crazyhottommy.blogspot.com/2013/04/batch-converting-coordinates-to.html
Going back to the question, I've found several ways to get a bed file containing the coordinates for the motifs.
1. Homer homepage
http://homer.salk.edu/homer/
at the very bottom, there are several links for human and mouse .The file contains all the known motif coordinates for the whole genome. Files are big (several Gb). one can get the specific motifs occurrences by grep.
2. from UCSC software http://genome.ucsc.edu/ENCODE/analysisTools.html
ENCODE-motifs http://compbio.mit.edu/encode-motifs/
at the bottom of the page:
only the human data are available.
3. Motif-map http://motifmap.ics.uci.edu/
click motif search
Several more organisms are supported
search the motif you want ( I use CTCF as an example)
Click save on the bottom right.
Click my motifs on the upper right.
export to bed or other format files.
Thursday, February 6, 2014
several tools for Hi-C data, ChIP-seq and methylation data
See the links below:
http://omictools.com/differential-peak-calling/
http://omictools.com/chip-seq-and-beyond/3c-4c-5c-hi-c/hicup-s1473.html
http://omictools.com/analytical-pipelines8/
http://omictools.com/dmr/
There are too many tools out there for established high-throughput sequencing based techniques.
1. you need to know what are out there. never try to re-invent the wheel
2. choose the right one to fit your own analysis. (only if none of them does what you want to do, write a script by yourself). choose one with good documentation and is being actively developed.
http://omictools.com/differential-peak-calling/
http://omictools.com/chip-seq-and-beyond/3c-4c-5c-hi-c/hicup-s1473.html
http://omictools.com/analytical-pipelines8/
http://omictools.com/dmr/
There are too many tools out there for established high-throughput sequencing based techniques.
1. you need to know what are out there. never try to re-invent the wheel
2. choose the right one to fit your own analysis. (only if none of them does what you want to do, write a script by yourself). choose one with good documentation and is being actively developed.
Subscribe to:
Posts (Atom)