Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Tuesday, July 19, 2016

several databases I want to share for RNA-seq data

Recently, there are several papers focusing on batch recomputing the public RNA-seq data sets either by local clusters or remote cloud computing. I will put the links below:

1.  A cloud-based workflow to quantify transcript-expression levels in public cancer compendia. one can download the TCGA and CCLE RNA-seq data in transcript-level  in counts and TPM units.

used gencode v24 as reference and kallisto to calculate.

2.  Rapid and efficient analysis of 20,000 RNA-seq samples with Toil.  Access the data here. Note that the data are mapped to hg38 reference genome and gencode v23 as annotation. TCGA/ICGC /GTEx and CCLE data.

3.  intropolis is a list of exon-exon junctions found across 21,504 human RNA-seq samples on the Sequence Read Archive (SRA) from spliced read alignment to hg19 with Rail-RNA.

4. RESTful RNA-seq Analysis API A simple RESTful API to access analysis results of all public RNAseq data for nearly 200 species in European Nucleotide Archive.

5. ExpressionAtlas bioconductor package: This package is for searching for datasets in EMBL-EBI Expression Atlas, and downloading them into R for further analysis. Each Expression Atlas dataset is represented as a SimpleList object with one element per platform. Sequencing data is contained in a SummarizedExperiment object, while microarray data is contained in an ExpressionSet or MAList object.

6. Bgee as suggested in the comment.

7. The Digital Expression Explorer The Digital Expression Explorer (DEE) is a repository of digital gene expression profiles mined from public RNA-seq data sets. These data are obtained from NCBI Short Read Archive.


  1. Hi,
    You might be interested in the Bgee database, which provides gene expression patterns in multiple animal species, produced from multiple technologies (RNA-Seq, Affymetrix, in situ hybridization and EST data), and is based exclusively on curated "normal", healthy, expression data.
    Access to Bgee:
    Our blog:
    We also have a Bioconductor package to access the data: