Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Thursday, October 31, 2013

python code for getting the reverse complement DNA strand

Following pythonforbiologist http://pythonforbiologists.com/index.php/business-card-solutions/, I wrote a very simple python script to get the reverse complement DNA strand:


It is a very simple script but several things I want to mention:
1. what is the fast string concatenation method in python?
in the first function, we tend to write something like:

out_seq = ' '
for base in reversed(seq):
    out_seq += seq_dict[base]

but it turned out to be very slow. Read this post for more details  http://www.skymind.com/~ocrow/python_string/
the list comprehension is the fastest and most elegant way to concatenate a string.


2. for more complicated IUPAC ambiguity codes, read this post:
http://pythonforbiologists.com/index.php/how-to-count-non-dna-bases-in-a-sequence-using-python/
Nucleotide Code:  Base:
----------------  -----
A.................Adenine
C.................Cytosine
G.................Guanine
T (or U)..........Thymine (or Uracil)
R.................A or G
Y.................C or T
S.................G or C
W.................A or T
K.................G or T
M.................A or C
B.................C or G or T
D.................A or G or T
H.................A or C or T
V.................A or C or G
N.................any base
. or -............gap

No comments:

Post a Comment