I guess I should write something for the end of 2016. A lot of things (good and bad) happened, and I am thankful to all of them!
I started my postdoc in Dr. Roel Verhaak's lab in March 2015. The past 1 year and a half is transforming for me in terms of computational training.
Roel moved to JAX Genomics for Medicine Oct 2016. I could not move due to my family issues. I just had a little girl Phoebe in August! She is such a joyful and sweet girl. It is amazing experience being a new parent, of course challenging too.
I decided to stay at MD Anderson with Dr.Andrew Futreal and Dr.Jianjun Zhang's group and began studying intra-tumor heterogeneity and tumor evolution by analyzing multi-region whole exome-sequencing and DNA methylation array data. I always want to learn something new. Although it may take me a while to know the details of this subject, I am still positive.
I am so grateful for this invaluable experience in Roel's lab. I have gained much more experience handling large data sets (TB size!) and enhanced my computational skills in many ways.
For the coming 2017, I should be:
1. busy with Phoebe.
2. writing 1-2 papers.
3. writing a book chapter on ChIP-seq for the biostar handbook. It should come out in the mid of 2017.
3. writing a small R package for practice.
4. learning a piano song.
I try to commit myself to write at least 1-2 posts every month to keep the audiences. 2016 looks good. I am also thinking to move my blog to github, but it is going to be a long-term project (learning how to use Jekylle will take a while).
That being said, I am looking forward to a new and prosperous 2017!
A wet-dry hybrid biologist's take on genetics and genomics. Mostly is about Linux, R, python, reproducible research, open science and NGS. Grab my book to transform yourself to a computational biologist https://divingintogeneticsandgenomics.ck.page/
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Tuesday, December 27, 2016
Friday, November 25, 2016
compare boxplot, violin plot and sina plot
Popular plots for showing the distribution of the data are boxplot and violin plot. Both have their own advantages and disadvantages. Combination of both can have more power. In a plot, we want to show people all the data, not only the summary statistics such as mean and median. full article link http://rpubs.com/crazyhottommy/sina-plot
Sometime ago, I came across sina plot.
sinaplot is inspired by the strip chart and the violin plot. By letting the normalized density of points restrict the jitter along the x-axis the plot displays the same contour as a violin plot, but resemble a simple strip chart for small number of data points. In this way the plot conveys information of both the number of data points, the density distribution, outliers and spread in a very simple, comprehensible and condensed format.
I know that ggforce
has a geom_sina()
for the same purpose and want to try it out.
Let’s load the libraries first:
library(ggforce)
library(ggplot2)
head(iris)
Sepal.Length
<dbl>
Sepal.Width
<dbl>
Petal.Length
<dbl>
Petal.Width
<dbl>
Species
<fctr>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
6 rows
A violin plot:
p<- ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_violin(aes(fill=Species))
p
we can add the mean and standard deviation for a bit more information:
p + stat_summary(fun.data=mean_sdl, fun.args = list(mult=1),
geom="pointrange", color="red")
Very nice! How about if I want to see all the points?
# violin plot with dot plot
p + geom_dotplot(binaxis='y', stackdir='center', dotsize=0.5)
How about showing the jittered points instead?
# violin plot with jittered points
# 0.2 : degree of jitter in x direction
p + geom_jitter(shape=16, position=position_jitter(0.2))
Let’s combine boxplot with violin plot
# violin plot with boxplot
p + geom_boxplot(width = 0.2) + geom_jitter(shape=16, position=position_jitter(0.2))
compare with sina plot
## sina plot
ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_sina(aes(color=Species))
mean, median and std can be overlayed
ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_sina(aes(color=Species)) +
stat_summary(fun.data=mean_sdl, fun.args = list(mult=1), geom="errorbar", color="red", width=0.2) +
stat_summary(fun.y=mean, geom="point", color="red") +
ggtitle("sina with mean +- standard deviation")
You can decide which plot to use for your data. I will argue that they are all better than bar plot with error bars. Showing all the data points (distribution of the data points can be judged) with summary statistics is preferred.
sinaplot is inspired by the strip chart and the violin plot. By letting the normalized density of points restrict the jitter along the x-axis the plot displays the same contour as a violin plot, but resemble a simple strip chart for small number of data points. In this way the plot conveys information of both the number of data points, the density distribution, outliers and spread in a very simple, comprehensible and condensed format.
ggforce
has a geom_sina()
for the same purpose and want to try it out.library(ggforce)
library(ggplot2)
head(iris)
Sepal.Length
<dbl>
|
Sepal.Width
<dbl>
|
Petal.Length
<dbl>
|
Petal.Width
<dbl>
|
Species
<fctr>
| |
---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
6 rows
p<- ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_violin(aes(fill=Species))
p
p + stat_summary(fun.data=mean_sdl, fun.args = list(mult=1),
geom="pointrange", color="red")
# violin plot with dot plot
p + geom_dotplot(binaxis='y', stackdir='center', dotsize=0.5)
# violin plot with jittered points
# 0.2 : degree of jitter in x direction
p + geom_jitter(shape=16, position=position_jitter(0.2))
# violin plot with boxplot
p + geom_boxplot(width = 0.2) + geom_jitter(shape=16, position=position_jitter(0.2))
## sina plot
ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_sina(aes(color=Species))
ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_sina(aes(color=Species)) +
stat_summary(fun.data=mean_sdl, fun.args = list(mult=1), geom="errorbar", color="red", width=0.2) +
stat_summary(fun.y=mean, geom="point", color="red") +
ggtitle("sina with mean +- standard deviation")
Subscribe to:
Posts (Atom)