I have not written in python for so long, I've been using R for a long time. This provide me a good practice opportunity.
The basic problem is to format some files, and I know it can be done by other tools like sed and awk ( see an awk solution below), but python just gives you much more flexibility in terms of what you can do and gives you more readability.
myfasta file
>comp1558_c0_seq1 len=204 path=[4976:0-203]
ATTGTACTTAATCTA.....
>comp8142_c0_seq1 len=222 path=[8835:0-221]
GTACAATCACGGCACATT......
>comp8570_c0_seq1 len=232 path=[3344:0-232]
GCATCGCTTATCGGTTTA......
annotation file:
comp1558_c0_seq1 repressor protein
comp8142_c0_seq1 aspartate aminotransferase
comp8357_c0_seq1 l-xylulose reductase
comp8387_c0_seq1 succinyl- synthetase alpha
comp8570_c0_seq1 fatty acid synthase beta subunit dehydratase
see the code below:
output:
>comp1558_c0_seq1 repressor protein
ATTGTACTTAATCTA.....
>comp8142_c0_seq1 aspartate aminotransferase
GTACAATCACGGCACATT......
>comp8570_c0_seq1 fatty acid synthase beta subunit dehydratase
GCATCGCTTATCGGTTTA......
if the dictionary file has only two records for each line, this awk should work:
http://stackoverflow.com/questions/11678939/replace-text-based-on-a-dictionary
Usage:
http://www.gnu.org/software/gawk/manual/html_node/String-Functions.html
http://www.gnu.org/software/gawk/manual/html_node/Arrays.html
awk -f foo.awk dict.dat user.dat
http://www.gnu.org/software/gawk/manual/html_node/String-Functions.html
http://www.gnu.org/software/gawk/manual/html_node/Arrays.html
NR == FNR {
rep[$1] = $2
next
}
{
for (key in rep) {
gsub(key, rep[key])
}
print
}
happy it is useful!
ReplyDelete