## Thursday, April 30, 2015

### get all the promoter sequences of human hg19 genome

One of my former friends (a biologist who does not know much about computer) wants me to help her to get all the promoter sequences from the human genome. It is a very simple task and I think all the biologists should know how to do it. There are many ways to do it, but I will show you how to do it using bioconductor.

Using less I see all the sequences are there, to make sure the sequences are right, one can manually inspect the UCSC genome browser for several sequences.

I really do not want to dig in (google is your friend) to find a way to write the name using the SYMBOL rather than the ENTREZID....
You can convert the gene ids by many ways too.
I have two posts for that http://crazyhottommy.blogspot.com/2014/09/converting-gene-ids-using-bioconductor.html
and http://crazyhottommy.blogspot.com/2014/09/mapping-gene-ids-with-mygene.html
In addition, I prefer to prepare a bed file for all the promoters using bedtools slop (RefSeq table from UCSC, or from a GENCODE GTF file). Then, use bedtools to extract DNA sequences using bedtools getfasta. To me, it is more flexible on the command lines.
see my previous post here http://crazyhottommy.blogspot.com/2015/02/fetch-genomic-sequences-from-coordinates.html

## Thursday, April 23, 2015

### simulation of distribution of means draw from exponential distribution

"central limit theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed, regardless of the underlying distribution." [1]

I am going to draw 40 numbers from exponential distribution [2] (you can do it for any distribution) for 1000 times and examine the distribution of the means. In R, you can do it by rexp(40, lambda),  where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations for this specific test.

see a gist below.

[1] http://en.wikipedia.org/wiki/Central_limit_theorem
[2] http://en.wikipedia.org/wiki/Exponential_distribution

## Monday, April 6, 2015

### My first Software Carpentry workshop as an instructor

I just came back from the software-carpentry workshop held on April 2 and 3 at the University of Miami. The workshop link is here http://xuf12.github.io/2015-04-02-umiami/