Using less I see all the sequences are there, to make sure the sequences are right, one can manually inspect the UCSC genome browser for several sequences.
I really do not want to dig in (google is your friend) to find a way to write the name using the SYMBOL rather than the ENTREZID....
You can convert the gene ids by many ways too.
I have two posts for that http://crazyhottommy.blogspot.com/2014/09/converting-gene-ids-using-bioconductor.html
and http://crazyhottommy.blogspot.com/2014/09/mapping-gene-ids-with-mygene.html
In addition, I prefer to prepare a bed file for all the promoters using bedtools slop (RefSeq table from UCSC, or from a GENCODE GTF file). Then, use bedtools to extract DNA sequences using bedtools getfasta. To me, it is more flexible on the command lines.
see my previous post here http://crazyhottommy.blogspot.com/2015/02/fetch-genomic-sequences-from-coordinates.html
it's very useful to combine the R package BSgenome.Hsapiens.UCSC.hg19 and TxDb.Hsapiens.UCSC.hg19.knownGene . In fact, most of the bioconductor packages are designed to replace the small script in Linux.
ReplyDeleteAnd I believe you are a Chinese , yes or right ?
If yes, maybe we can talk more with each other
YES, I am a Chinese. Nice to meet you.
Deletehi, very usefull, im trying to do the same thing but using maize (Zea mays)
ReplyDeleteI want to get promoters and analyse some regulatory elements and plot them
how do i do?
greatings
not sure if there is a BSgenome package for maize in bioconductor, you can search it.
Delete