Popular plots for showing the distribution of the data are boxplot and violin plot. Both have their own advantages and disadvantages. Combination of both can have more power. In a plot, we want to show people all the data, not only the summary statistics such as mean and median. full article link http://rpubs.com/crazyhottommy/sina-plot
Sometime ago, I came across sina plot.
sinaplot is inspired by the strip chart and the violin plot. By letting the normalized density of points restrict the jitter along the x-axis the plot displays the same contour as a violin plot, but resemble a simple strip chart for small number of data points. In this way the plot conveys information of both the number of data points, the density distribution, outliers and spread in a very simple, comprehensible and condensed format.
I know that ggforce
has a geom_sina()
for the same purpose and want to try it out.
Let’s load the libraries first:
library(ggforce)
library(ggplot2)
head(iris)
Sepal.Length
<dbl>
Sepal.Width
<dbl>
Petal.Length
<dbl>
Petal.Width
<dbl>
Species
<fctr>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
A violin plot:
p<- ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_violin(aes(fill=Species))
p
we can add the mean and standard deviation for a bit more information:
p + stat_summary(fun.data=mean_sdl, fun.args = list(mult=1),
geom="pointrange", color="red")
Very nice! How about if I want to see all the points?
# violin plot with dot plot
p + geom_dotplot(binaxis='y', stackdir='center', dotsize=0.5)
How about showing the jittered points instead?
# violin plot with jittered points
# 0.2 : degree of jitter in x direction
p + geom_jitter(shape=16, position=position_jitter(0.2))
Let’s combine boxplot with violin plot
# violin plot with boxplot
p + geom_boxplot(width = 0.2) + geom_jitter(shape=16, position=position_jitter(0.2))
compare with sina plot
## sina plot
ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_sina(aes(color=Species))
mean, median and std can be overlayed
ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_sina(aes(color=Species)) +
stat_summary(fun.data=mean_sdl, fun.args = list(mult=1), geom="errorbar", color="red", width=0.2) +
stat_summary(fun.y=mean, geom="point", color="red") +
ggtitle("sina with mean +- standard deviation")
You can decide which plot to use for your data. I will argue that they are all better than bar plot with error bars. Showing all the data points (distribution of the data points can be judged) with summary statistics is preferred.
sinaplot is inspired by the strip chart and the violin plot. By letting the normalized density of points restrict the jitter along the x-axis the plot displays the same contour as a violin plot, but resemble a simple strip chart for small number of data points. In this way the plot conveys information of both the number of data points, the density distribution, outliers and spread in a very simple, comprehensible and condensed format.
ggforce
has a geom_sina()
for the same purpose and want to try it out.library(ggforce)
library(ggplot2)
head(iris)
Sepal.Length
<dbl>
|
Sepal.Width
<dbl>
|
Petal.Length
<dbl>
|
Petal.Width
<dbl>
|
Species
<fctr>
| |
---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
p<- ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_violin(aes(fill=Species))
p
p + stat_summary(fun.data=mean_sdl, fun.args = list(mult=1),
geom="pointrange", color="red")
# violin plot with dot plot
p + geom_dotplot(binaxis='y', stackdir='center', dotsize=0.5)
# violin plot with jittered points
# 0.2 : degree of jitter in x direction
p + geom_jitter(shape=16, position=position_jitter(0.2))
# violin plot with boxplot
p + geom_boxplot(width = 0.2) + geom_jitter(shape=16, position=position_jitter(0.2))
## sina plot
ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_sina(aes(color=Species))
ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_sina(aes(color=Species)) +
stat_summary(fun.data=mean_sdl, fun.args = list(mult=1), geom="errorbar", color="red", width=0.2) +
stat_summary(fun.y=mean, geom="point", color="red") +
ggtitle("sina with mean +- standard deviation")
No comments:
Post a Comment