Update on 10/30/14, a mygene bioconductor package is online http://bioconductor.org/packages/release/bioc/html/mygene.html
Recently I got to know mygene, a python wrapper for the mygene.info services to map gene ids.
I found it very handy to convert gene ids. see a gist below.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python | |
# ID mapping using mygene | |
# https://pypi.python.org/pypi/mygene | |
# http://nbviewer.ipython.org/gist/newgene/6771106 | |
# http://mygene-py.readthedocs.org/en/latest/ | |
# 08/30/14 | |
__author__ = 'tommy' | |
import mygene | |
import fileinput | |
import sys | |
mg = mygene.MyGeneInfo() | |
# mapping gene symbols to Entrez gene ids and Ensemble gene ids. | |
# fileinput will loop through all the lines in the input specified as file names given in command-line arguments, | |
# or the standard input if no arguments are provided. | |
# build a list from an input file with one gene name in each line | |
def get_gene_symbols(): | |
gene_symbols = [] | |
for line in fileinput.input(): | |
gene_symbol = line.strip() # assume each line contains only one gene symbol | |
gene_symbols.append(gene_symbol) | |
fileinput.close() | |
return gene_symbols | |
Entrez_ids = mg.querymany(get_gene_symbols(), scopes='symbol', fields='entrezgene, ensembl.gene', species='human', | |
as_dataframe=True, verbose=False) | |
# set as_dataframe to True will return a pandas dataframe object, verbose=False suppress the messages like "finished". | |
# Entrez_ids.to_csv(sys.stdout, sep="\t") # write the dataframe to stdout, but will not have NaNs on the screen | |
# if no matches were found | |
sys.stdout.write(Entrez_ids.to_string()) # sys.stdout.write() expects the character buffer object | |
# Entrez_ids.to_csv("Entrez_ids.txt", sep="\t") # write the pandas dataframe to csv |
or python geneSymbol2Entrez.py input.txt > output.txt where input.txt contains one gene name in each line. pretty neat!
No comments:
Post a Comment