Please refer to the posts in biostar:
Have a look at this help page from UCSC genome browser http://genome.ucsc.edu/FAQ/FAQdownloads.html
in a web browser :
This XML
file does not appear to have any style information associated with it. The
document tree is shown below.
<DASDNA>
<SEQUENCE id="chr1" start="100000" stop="100010" version="1.00">
<DNA length="11">cactaagcaca</DNA>
</SEQUENCE>
</DASDNA>
The coordinates are 1 based !One can also use the ensemble DAS server
http://www.ensembl.org/das/Homo_sapiens.GRCh37.reference/sequence?segment=1:100000,100010
The coordinates are 1 based too!Segment: 1:100000-100010
100000 cactaagcaca 100010
Other methods in a gist:
Make sure you know how the software works. Mistaking 0 or 1 based coordinates can be dangerous.
Concerning speed, I have not done a test yet, but if you have a fasta file in your local machine, it will be faster to retrieve sequences. samtools requires you to index the genome fasta first, it takes a while, but you only need to do it once. Similarly, one can use the pyfaidx package https://github.com/mdshw5/pyfaidx . For the cruzdb package, you can mirror the UCSC database to your local computer for faster accession.
Hii Tommy Tang
ReplyDeleteCan you please provide a python script for plotting the RNA Seq data to find differentially expressed genes. I did this with cummeRbund package in R. But now I need a python script . Thank you !!