I was trained as a wet
biologist and I started learning coding in 2012 April with my first ever python
book: python programming for absolute beginners. I still remember the days that after work I would
sit down in front of the computer and go through the book until 10pm
everyday.
It was not that practical in
terms of translating what I have learned to what I want to analyze in the lab, but
still I have entered into a new world!
In the Fall semester of 2012,
I took a beginner bioinformatics course at University of Florida using practical computing for biologists as a reference book.
It is a great book and it taught me regular expression, Unix commands
and some python stuffs that directly related to biology. I was deeply attracted
by the beauty of codes and was surprised/satisfied that how useful learning
coding can be.
Lessons I learned from that class:
Regular expression is extremely useful! At least one needs to know the basics and you can
then always google and find solutions there.
Bioinformatics is a field
that evolves so fast that many tools you use may become obsolete tomorrow.
However, unix skills will never fade. I
urge every wet biologist like me to learn Unix commands first. It will take
time for you to be fluent in the terminal. It took me 2 years to feel really
confortable working in the terminal, so stop worrying and take your time.
Statistical programming
language R is very popular in the bioinformatics field. I started using R because I can take advantage of the rich packages in bioconductor. I started from the basics with The art of R programming. After getting the basics, learn to use packages like dplyr, ggplot2 will greatly reduce the
complexity of your code and enhancer your productivity. Surprisingly, all these
awesome packages were developed by the same person: Hadley Wikham.
Learn some git. Git is a version control system that tracks
your code. I am still a beginner, but I realized how important it is to version
control my codes. For this reason, I have a github repo
where I put my codes. I am still learning git everyday.
When the project grows big,
you need to well manage it. There are several resources that I recommend you to
read before any project:
2. Designing project by Vince Buffalo Vince Buffalo has a book
which I highly recommend for everyone: Bioinformatics data skills. It covers many points that I
want to say in this post. I might write a review on it after finishing all the
chapters.
The take home message for me
is that it is not enough for you to just run the code, get some results and
then publish them.
One needs to be aware that:
1.
Computers make
mistakes. They can give you non-sense results and exit without error, so make
extensive tests before running your code.
2. Share your codes. Even your codes
are correct, you need to share them so that other people can look at them and
may improve them.
3.
Make your codes
reusable. Do not hard code your scripts. If it takes a file path as input, make
it as an argument in your scripts.
4. Modulate your scripts. Data could come
in different stage of formats. Take
ChIP-sequencing data analysis as an example, if you have a script that starts
processing the data from fastq to the final peaks. You may want to modulate your scripts to two
modules: one for mapping fastq to bam, and the other for bam to peaks. Modulate your scripts so that one can
use your script when the data come in a bam format.
5. Heavily comment your scripts. It will not only make other people to understand your codes better, but also help the future you to understand what you did.
5. Heavily comment your scripts. It will not only make other people to understand your codes better, but also help the future you to understand what you did.
6.
You need to make
your analysis reproducible. Each step of your analysis should be documented in
a markdown file. I say every step, yes, every command that you strike in the
terminal getting the intermediate files need to be taken down. Moreover, how,
when and where did you download the data need to be documented. This will save the future you! Many
experienced programmers overlook this point.
I am glad that I have come to this stage. I love what I am doing now and feel satisfied when I learn new things everyday. I want to encourage all the wet biologists: believe you can program as well :)
Great Blog.....Great Job
ReplyDeleteLearn coding