Tuesday, December 27, 2016

The End of 2016

I guess I should write something for the end of 2016. A lot of things (good and bad) happened, and I am thankful to all of them!

I started my postdoc in Dr. Roel Verhaak's lab in March 2015. The past 1 year and a half is transforming for me in terms of computational training.

Roel moved to JAX Genomics for Medicine Oct 2016. I could not move due to my family issues. I just had a little girl Phoebe in August! She is such a joyful and sweet girl. It is amazing experience being a new parent, of course challenging too.

I decided to stay at MD Anderson with Dr.Andrew Futreal and Dr.Jianjun Zhang's group and began studying intra-tumor heterogeneity and tumor evolution by analyzing multi-region whole exome-sequencing and DNA methylation array data. I always want to learn something new. Although it may take me a while to know the details of this subject, I am still positive.

I am so grateful for this invaluable experience in Roel's lab. I have gained much more experience handling large data sets (TB size!) and enhanced my computational skills in many ways.

For the coming 2017, I should be:

1. busy with Phoebe.
2. writing 1-2 papers.
3. writing a book chapter on ChIP-seq for the biostar handbook. It should come out in the mid of 2017.
3. writing a small R package for practice.
4. learning a piano song.

I try to commit myself to write at least 1-2 posts every month to keep the audiences. 2016 looks good. I am also thinking to move my blog to github, but it is going to be a long-term project (learning how to use Jekylle will take a while).

That being said, I am looking forward to a new and prosperous 2017!

Friday, November 25, 2016

compare boxplot, violin plot and sina plot

sina-plot

Sometime ago, I came across sina plot.

sinaplot is inspired by the strip chart and the violin plot. By letting the normalized density of points restrict the jitter along the x-axis the plot displays the same contour as a violin plot, but resemble a simple strip chart for small number of data points. In this way the plot conveys information of both the number of data points, the density distribution, outliers and spread in a very simple, comprehensible and condensed format.

I know that ggforce has a geom_sina() for the same purpose and want to try it out.

Let’s load the libraries first:

library(ggforce)
library(ggplot2)
head(iris)

	Sepal.Length <dbl>	Sepal.Width <dbl>	Petal.Length <dbl>	Petal.Width <dbl>	Species <fctr>
1	5.1	3.5	1.4	0.2	setosa
2	4.9	3.0	1.4	0.2	setosa
3	4.7	3.2	1.3	0.2	setosa
4	4.6	3.1	1.5	0.2	setosa
5	5.0	3.6	1.4	0.2	setosa
6	5.4	3.9	1.7	0.4	setosa

A violin plot:

p<- ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_violin(aes(fill=Species)) 
p

we can add the mean and standard deviation for a bit more information:

p + stat_summary(fun.data=mean_sdl, fun.args = list(mult=1), 
                 geom="pointrange", color="red")

Very nice! How about if I want to see all the points?

# violin plot with dot plot
p + geom_dotplot(binaxis='y', stackdir='center', dotsize=0.5)

How about showing the jittered points instead?

# violin plot with jittered points
# 0.2 : degree of jitter in x direction
p + geom_jitter(shape=16, position=position_jitter(0.2))

Let’s combine boxplot with violin plot

# violin plot with boxplot
p +  geom_boxplot(width = 0.2) + geom_jitter(shape=16, position=position_jitter(0.2))

compare with sina plot

## sina plot
ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_sina(aes(color=Species))

mean, median and std can be overlayed

ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_sina(aes(color=Species)) + 
        stat_summary(fun.data=mean_sdl, fun.args = list(mult=1), geom="errorbar", color="red", width=0.2) +
        stat_summary(fun.y=mean, geom="point", color="red") + 
        ggtitle("sina with mean +- standard deviation")

You can decide which plot to use for your data. I will argue that they are all better than bar plot with error bars. Showing all the data points (distribution of the data points can be judged) with summary statistics is preferred.

Diving into Genetics and Genomics

My github papge

Tuesday, December 27, 2016

The End of 2016

Friday, November 25, 2016

compare boxplot, violin plot and sina plot

Labels

My Blog List