Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Wednesday, September 10, 2014

converting gene ids using bioconductor with biomaRt and annotation packages

I had a post using  mygene to convert gene ids. Bioconductor can do the same job.
I put a gist on github.

##### use bioconductor annotation packages #######
source("http://Bioconductor.org/biocLite.R")
biocLite("org.Hs.eg.db")
biocLite(c("GenomicFeatures", "AnnotationDbi"))
library("org.Hs.eg.db")
library("AnnotationDbi")
library("GenomicFeatures")
# all the possible mappings
ls("package:org.Hs.eg.db")
# convert Entrez_ids to gene_symbols
myEntrez_ids <- c("1","10","100","1000","37690")
mySymbols<- mget(myEntrez_ids, org.Hs.egSYMBOL, ifnotfound=NA)
mySymbols
unlist(mySymbols)
# convert gene_symbols to Entrez_ids
mySymbols_2 <- c("VEGFA","CTCF", "SNAI1","KDM1A")
myEntrez_ids_2<- mget(mySymbols_2, org.Hs.egSYMBOL2EG, ifnotfound=NA)
unlist(myEntrez_ids_2)
?AnnotationDbi::mget # get help
# or use the select function
?AnnotationDbi::select
head(keys(org.Hs.eg.db))
keytypes(org.Hs.eg.db)
select(org.Hs.eg.db, keys = mySymbols_2, columns=c("SYMBOL","REFSEQ","GENENAME","ENTREZID"),keytype="SYMBOL")
select(org.Hs.eg.db, keys = myEntrez_ids, columns=c("SYMBOL","REFSEQ","GENENAME","ENTREZID"),keytype="ENTREZID")
# How many gene symbols
symbol <- keys(org.Hs.eg.db, "SYMBOL")
length(symbol)
############### use biomart ###################
library(biomaRt)
mart<- useMart(biomart = 'ensembl', dataset = 'hsapiens_gene_ensembl')
# get sequences
seq <- getSequence(id = 'BRCA1', type='hgnc_symbol',seqType="3utr", mart = mart) # pretty slow...
show(seq)
seq2 <-getSequence(id="ENST00000520540",type='ensembl_transcript_id',seqType='gene_flank', upstream =30, mart=mart)
show(seq2)
# convert gene ids gene symbol to refseq
geneList<- c("VEGFA","CTCF", "SNAI1","KDM1A")
results<- getBM(attributes = c("refseq_mrna","hgnc_symbol"), filters="hgnc_symbol", values=geneList, mart=mart)
results
?getBM
view raw convert_ids.r hosted with ❤ by GitHub
For more examples see posts from Dave Tang:
http://davetang.org/muse/2013/12/16/bioconductor-annotation-packages/
http://davetang.org/muse/2013/05/23/using-the-bioconductor-annotation-packages/
http://davetang.org/muse/2013/11/25/thoughts-converting-gene-identifiers/

No comments:

Post a Comment