Casa Romana dell’Ara Coeli

How to add different segment, annotation and color to each facet in ggplot

今回もRネタである.論文の図を作成していて,ggplot2のfacetを使用して作成した図の中のfacet毎に異なる有意差を示す群を線分で結んで,その上にasteriskを色を変えてつけようとしたところ,結構苦労したので忘れないうちにまとめておく.annotate()というのもあるが,これだとすべてのfacetに同じ内容が入ってしまう.今回の目的であるfacetによって異なる内容の注釈を入れるためには,geom_segmentやgeom_textを使う.

References

Data Preparation

まず,架空のデータを作成する.Drug1とDrug2を投与して1日後と7日後の物質Xの血中濃度変化を対照,つまり投与前と比較するという実験において,Drug1では差がなく,Drug2では差があるという結果にする.

set.seed(100)
data.df1 <- data.frame(Control = rnorm(20, mean = 5, sd = 1),
                       Day1 = rnorm(20, mean = 5, sd = 1.5),
                       Day7 = rnorm(20, mean = 5, sd = 2))
library(reshape)
data_melt.df1 <- melt(data.df1)

data.df2 <- data.frame(Control = rnorm(20, mean = 5, sd = 1.8),
                       Day1 = rnorm(20, mean = 10, sd = 5),
                       Day7 = rnorm(20, mean = 20, sd = 7))
data_melt.df2 <- melt(data.df2)

data_melt.df <- cbind.data.frame(data_melt.df1, data_melt.df2$value)
colnames(data_melt.df) <- c("Day","Drug1","Drug2") # chage column name of dataframe
head(data_melt.df)
Using  as id variables
Using  as id variables
      Day    Drug1    Drug2
1 Control 4.497808 4.528408
2 Control 5.131531 4.876081
3 Control 4.921083 4.318010
4 Control 5.886785 9.647526
5 Control 5.116971 5.233701
6 Control 5.318630 3.716555

これで解析用のデータが出来上がった.一応,差を確認してみる.

Tukey multiple comparison

Tukeyの多重比較試験を行う.2つの方法で確認しておく.

TukeyHSD

TukeyHSD(aov(data_melt.df$Drug1~data_melt.df$Day))
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = data_melt.df$Drug1 ~ data_melt.df$Day)

$`data_melt.df$Day`
                    diff       lwr      upr     p adj
Day1-Control  0.03086351 -1.274284 1.336011 0.9982163
Day7-Control -0.14426065 -1.449408 1.160887 0.9617765
Day7-Day1    -0.17512416 -1.480272 1.130023 0.9442066
TukeyHSD(aov(data_melt.df$Drug2~data_melt.df$Day))
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = data_melt.df$Drug2 ~ data_melt.df$Day)

$`data_melt.df$Day`
                  diff       lwr       upr     p adj
Day1-Control  4.874433  1.128239  8.620628 0.0076229
Day7-Control 15.114597 11.368403 18.860791 0.0000000
Day7-Day1    10.240164  6.493970 13.986358 0.0000000

テューキーの方法による多重比較

source("http://aoki2.si.gunma-u.ac.jp/R/src/tukey.R", encoding="euc-jp")
tukey(data_melt.df$Drug1, data_melt.df$Day)
$result1
        n     Mean Variance
Group1 20 5.107867 0.516335
Group2 20 5.138731 1.188671
Group3 20 4.963606 7.119660

$Tukey
             t         p
1:2 0.05690583 0.9982163
1:3 0.26598637 0.9617765
2:3 0.32289220 0.9442066

$phi
[1] 57

$v
[1] 2.941555
tukey(data_melt.df$Drug2, data_melt.df$Day)
$result1
        n      Mean Variance
Group1 20  4.811422  4.07251
Group2 20  9.685855 32.74563
Group3 20 19.926019 35.88609

$Tukey
           t            p
1:2 3.131158 7.622857e-03
1:3 9.709065 1.156397e-11
2:3 6.577907 4.783876e-08

$phi
[1] 57

$v
[1] 24.23474

以上で,Drug1では物質Xの濃度はコントロールと差がないこと,Drug2ではコントロール,Day1,Day7の間に有意差が認められることが確認された.そのようにデータを作ったので当たり前である...(^^;;;;;

Boxplot by ggplot2

ようやくここから上記のデータを使って,ggplot2でboxplotを描いてみる.まずはmeltを用いてwide formatからlong formatへのデータの整形を行う.

DataM <- melt(data_melt.df, id = "Day")
head(DataM)
      Day variable    value
1 Control    Drug1 4.497808
2 Control    Drug1 5.131531
3 Control    Drug1 4.921083
4 Control    Drug1 5.886785
5 Control    Drug1 5.116971
6 Control    Drug1 5.318630

Calculate mean and SE

平均とSEも求めておく.

library(plyr)
DataM_summary <- ddply(DataM, .(variable, Day), summarise, N = length(value), mean = mean(value), sd = sd(value), se = sd(value)/sqrt(length(value)))
DataM_summary
  variable     Day  N      mean        sd        se
1    Drug1 Control 20  5.107867 0.7185645 0.1606759
2    Drug1    Day1 20  5.138731 1.0902617 0.2437899
3    Drug1    Day7 20  4.963606 2.6682692 0.5966431
4    Drug2 Control 20  4.811422 2.0180460 0.4512488
5    Drug2    Day1 20  9.685855 5.7223794 1.2795629
6    Drug2    Day7 20 19.926019 5.9904997 1.3395165

ついで,ggplot2のggplotでboxplotを描く.個々のデータをgeom_jitter,あるいは,geom_pointを用いて重ねてプロットしておく.どちらの方法でも下記のように同じ図になる.

geom_jitter

library(ggplot2)
TestBoxPlot <- ggplot(DataM, aes(x = Day, y = value, colour = Day, fill = Day)) +
    geom_boxplot(alpha = 0.40) +
    facet_wrap(~variable, ncol = 3, scales="fixed") +
    coord_cartesian(ylim = c(0,38)) +
    theme_bw() +
    theme(axis.text.x  = element_text(size=14), axis.text.y = element_text(size=14), strip.text.x = element_text(size =16)) +
    theme(axis.title.x = element_text(size=14), axis.title.y = element_text(size=16), plot.title=element_text(size=0)) +
    xlab("") +
    ylab("Relative value to control") +
    theme(legend.position = "none") +   # delete legend
    geom_jitter(shape=16, size=2, position=position_jitter(0.1))  # plot individual point with jittering

TestBoxPlot

geom_point

TestBoxPlot2 <- ggplot(DataM, aes(x = Day, y = value, colour = Day, fill = Day)) +
    geom_boxplot(alpha = 0.40) +
    facet_wrap(~variable, ncol = 3, scales="fixed") +
    coord_cartesian(ylim = c(0,38)) +
    theme_bw() +
    theme(axis.text.x  = element_text(size=14), axis.text.y = element_text(size=14), strip.text.x = element_text(size =16)) +
    theme(axis.title.x = element_text(size=14), axis.title.y = element_text(size=16), plot.title=element_text(size=0)) +
    xlab("") +
    ylab("Relative value to control") +
    theme(legend.position = "none") +   # delete legend
    geom_point(aes(fill = Day), size = 2, shape = 16, position = position_jitterdodge())  # plot individual point with jittering

TestBoxPlot2

Add mean bar

Ref: combine ggplot facet_wrap with geom_segment to draw mean line in scatterplot
平均値のバーを書き込む.これはgeom_segmentを使うが,すべてのfacetに書き込むので,単純である.

TestBoxPlot3 <- TestBoxPlot +
    geom_segment(data = DataM_summary, aes(x=as.numeric(as.factor(Day)) - 0.5,
                                           xend=as.numeric(as.factor(Day)) + 0.5,
                                           yend=mean,
                                           y=mean,
                                           colour=Day,
                                           alpha=0.7),
                 size = 1.5, linetype = 1)

TestBoxPlot3

Add segment and asterisk to Drug2 facet of boxplot

Dataframe for annotation

ここからが本番である.上記で作成したグラフを見ながら,どこからどこに線を引けばよいのか,どこにasteriskを置けばよいのか大体の見当をつけたうえで,注釈用のデータフレームを別途作成する.これは手作業でやらざるを得ない.できたグラフを見て微調整をしていく.

anno <- data.frame(
    x=c(0.9, 0.9, 3.1, 1.1, 1.1, 1.9, 2.1, 2.1, 2.9),
    y=c(10.5, 37, 37, 10.5, 26, 26, 23.5, 34, 34),
    xend=c(0.9, 3.1, 3.1, 1.1, 1.9, 1.9, 2.1, 2.9, 2.9),
    yend=c(37, 37, 32.5, 26, 26, 23.5, 34, 34, 32.5),
    variable="Drug2",
    xstar = c(1.5, 2, 2.5, NA, NA, NA, NA, NA, NA),
    ystar = c(27, 38, 35, NA, NA, NA, NA, NA, NA),
    lab = c("**", "***", "***", NA, NA, NA, NA, NA, NA),
    ast.color = c("red", "blue", "green", NA, NA, NA, NA, NA, NA))

anno
    x    y xend yend variable xstar ystar  lab ast.color
1 0.9 10.5  0.9 37.0    Drug2   1.5    27   **       red
2 0.9 37.0  3.1 37.0    Drug2   2.0    38  ***      blue
3 3.1 37.0  3.1 32.5    Drug2   2.5    35  ***     green
4 1.1 10.5  1.1 26.0    Drug2    NA    NA <NA>      <NA>
5 1.1 26.0  1.9 26.0    Drug2    NA    NA <NA>      <NA>
6 1.9 26.0  1.9 23.5    Drug2    NA    NA <NA>      <NA>
7 2.1 23.5  2.1 34.0    Drug2    NA    NA <NA>      <NA>
8 2.1 34.0  2.9 34.0    Drug2    NA    NA <NA>      <NA>
9 2.9 34.0  2.9 32.5    Drug2    NA    NA <NA>      <NA>

x, y, xend, yendは各線分の始点と終点で,xstar, ystarは注釈,今回はasteriskの位置を示す.labはasteriskそのものを指示し,colorはasteriskの色を指定している.

Add segment with geom_segment and asterisk with geom_text (black)

geom_segmentで線を引いて,geom_textでasteriskをつける.まずは黒色でやってみる. inherit.aes=FALSE をgeom_text()とgeom_segment()の内部に追加してggplot()内のfill=Dayを無視させる.

TestBoxPlot3 +
    geom_text(data = anno, aes(x = xstar, y = ystar, label = lab, colour = NULL), size = 7, family = "Times New Roman", inherit.aes = FALSE) +
    geom_segment(data = anno, aes(x = x,  y = y, xend=xend, yend=yend), inherit.aes = FALSE)

Add segment with geom_segment and asterisk with geom_text (color)

asteriskに色をつける.データフレーム annoに書き込んだ色データを明示的に指示して利用する.

TestBoxPlot3 +
    geom_text(data = anno, aes(x = xstar, y = ystar, label = lab), colour = anno$ast.color, size = 7, family = "Times New Roman", inherit.aes = FALSE) +
    geom_segment(data = anno, aes(x = x,  y = y, xend=xend, yend=yend), inherit.aes = FALSE)

問題はここである.どうしても, colour = anno$ast.color とデータフレームと変数を明示的に指示しないと色がおかしくなるか,エラーになってしまう.もっとうまくggplotにデータを読ませる方法をどなたかご教示いただければ幸甚である.

Barplot by ggplot2

次にbarplotを描いて同じことをやってみる.エラーバーは慣例通りSEにする.

Barplot

TestBarPlot <- ggplot(DataM_summary, aes(x = Day, y = mean, colour = Day, fill=Day)) +
    geom_errorbar(aes(ymin = mean, ymax = mean + se), width = 0.2) +
    geom_bar(position=position_dodge(), stat="identity", alpha=1/2, width=0.5) +
    facet_wrap(~variable, scales = "fixed", ncol=3) +
    coord_cartesian(ylim = c(0,30)) +
    theme_bw() +
    theme(axis.text.x  = element_text(size=14), axis.text.y = element_text(size=14), strip.text.x = element_text(size =16)) +
    theme(axis.title.x = element_text(size=14), axis.title.y = element_text(size=16), plot.title=element_text(size=0)) +
    xlab("") +
    ylab("Relative value to control") +
    theme(legend.position = "none")   # delete legend

TestBarPlot

Add segment and asterisk to Drug2 facet of barplot

Dataframe for annotation

anno2 <- data.frame(
    x=c(0.9, 0.9, 3.1, 1.1, 1.1, 1.9, 2.1, 2.1, 2.9),
    y=c(6.5, 29, 29, 6.5, 17, 17, 12.5, 25, 25),
    xend=c(0.9, 3.1, 3.1, 1.1, 1.9, 1.9, 2.1, 2.9, 2.9),
    yend=c(29, 29, 22.5, 17, 17, 12.5, 25, 25, 22.5),
    variable="Drug2",
    xstar = c(1.5, 2, 2.5, NA, NA, NA, NA, NA, NA),
    ystar = c(17.5, 29.5, 25.5, NA, NA, NA, NA, NA, NA),
    lab = c("**", "***", "***", NA, NA, NA, NA, NA, NA),
    ast.color = c("red", "blue", "green", NA, NA, NA, NA, NA, NA))

anno2
    x    y xend yend variable xstar ystar  lab ast.color
1 0.9  6.5  0.9 29.0    Drug2   1.5  17.5   **       red
2 0.9 29.0  3.1 29.0    Drug2   2.0  29.5  ***      blue
3 3.1 29.0  3.1 22.5    Drug2   2.5  25.5  ***     green
4 1.1  6.5  1.1 17.0    Drug2    NA    NA <NA>      <NA>
5 1.1 17.0  1.9 17.0    Drug2    NA    NA <NA>      <NA>
6 1.9 17.0  1.9 12.5    Drug2    NA    NA <NA>      <NA>
7 2.1 12.5  2.1 25.0    Drug2    NA    NA <NA>      <NA>
8 2.1 25.0  2.9 25.0    Drug2    NA    NA <NA>      <NA>
9 2.9 25.0  2.9 22.5    Drug2    NA    NA <NA>      <NA>

Add segment with geom_segment and asterisk with geom_text (color)

TestBarPlot +
    geom_text(data = anno2, aes(x = xstar, y = ystar, label = lab), colour = anno$ast.color, size = 7, family = "Times New Roman", inherit.aes = FALSE) +
    geom_segment(data = anno2, aes(x = x,  y = y, xend=xend, yend=yend), inherit.aes = FALSE)

barplotでも全く同様のグラフを作成することができた.

なお,通常のグラフをpdfで出力して,それをOmniGraffleなどのお絵かきソフトに持っていき,手作業で線やasteriskを描いて,再びpdfで出力する,という荒業も使えないことはない.しかし,ggplotの中で完結できるので,余分で面倒な手作業が不要になった.まぁ,上記の作業も面倒ではあるが,再現性があり,他の人にも渡せるというところが重要であると思う.

Combine boxplot and barplot into the same graphic

Ref1: patchwork
Ref2: patchworkを使って複数のggplotを組み合わせる

patchworkを使えば,上記の2種のグラフを簡単に一つの図にできる.比較しやすいようにbarplotのy軸のスケールをboxplotと同じに修正しておく.

# Boxplot
P1 <- TestBoxPlot3 +
    geom_text(data = anno, aes(x = xstar, y = ystar, label = lab), colour = anno$ast.color, size = 7, family = "Times New Roman", inherit.aes = FALSE) +
    geom_segment(data = anno, aes(x = x,  y = y, xend=xend, yend=yend), inherit.aes = FALSE)

# Barplot
TestBarPlot2 <- ggplot(DataM_summary, aes(x = Day, y = mean, colour = Day, fill=Day)) +
    geom_errorbar(aes(ymin = mean, ymax = mean + se), width = 0.2) +
    geom_bar(position=position_dodge(), stat="identity", alpha=1/2, width=0.5) +
    facet_wrap(~variable, scales = "fixed", ncol=3) +
    coord_cartesian(ylim = c(0,38)) +
    theme_bw() +
    theme(axis.text.x  = element_text(size=14), axis.text.y = element_text(size=14), strip.text.x = element_text(size =16)) +
    theme(axis.title.x = element_text(size=14), axis.title.y = element_text(size=16), plot.title=element_text(size=0)) +
    xlab("") +
    ylab("Relative value to control") +
    theme(legend.position = "none")   # delete legend
P2 <- TestBarPlot2 +
    geom_text(data = anno2, aes(x = xstar, y = ystar, label = lab), colour = anno$ast.color, size = 7, family = "Times New Roman", inherit.aes = FALSE) +
    geom_segment(data = anno2, aes(x = x,  y = y, xend=xend, yend=yend), inherit.aes = FALSE)

library(patchwork)
P1 + P2

このpatchworkは足し算だけで2つの図の合体ができてしまうすぐれもの.ちゃんと位置合わせなども自動的にしてくれる.素晴らしい.

しかし,こうして並べて比べてみると,barplotが如何に情報量の少ないグラフであるかが一目瞭然である.

Related

Next
Previous
comments powered by Disqus