Annotated Forest Plots using ggplot2

programming
Author

Katherine Hoffman

Published

September 15, 2022

This post contains a short R code walkthrough to make annotated forest plots like the one shown above. There are packages to make plots like these such as forester, forestplot, and ggforestplot, but sometimes I still prefer to make my own.

The big picture of this is that we’ll be making three separate ggplot2 objects and putting them together with patchwork. You could also use packages like cowplot, gridarrange or ggarrange to put the intermediate plot objects together.


Step 0: Load libraries and data

First we will load the necessary libraries and the data set. The data we’ll use for this plot are the effect estimates for 10 Cox regression models. The models, titled A-J, are stored in a data set called res, stored as a csv on my Github.

library(tidyverse)
library(gt)

res_log <- read_csv("https://raw.githubusercontent.com/kathoffman/steroids-trial-emulation/main/output/res_log.csv")
res <- read_csv("https://raw.githubusercontent.com/kathoffman/steroids-trial-emulation/main/output/res.csv")

res <- res_log |>
      rename_with(~str_c("log.", .), estimate:conf.high) |>
  select(-p.value) |>
  full_join(res)

The results object contains:

  • model: the model label A-J
  • log.estimate: log hazard ratio, since these were Cox regressions
  • log.conf.low and log.conf.high: log hazard ratio 95% confidence intervals
  • estimate: hazard ratio
  • conf.low and conf.high: hazard ratio 95% confidence intervals
  • p.value: corresponding p-value
glimpse(res)
Rows: 10
Columns: 8
$ model         <chr> "A", "B", "C", "D", "E", "F", "G", "H", "I", "J"
$ log.estimate  <dbl> -0.68468788, -0.05255784, -0.08640963, -0.12147024, -0.4…
$ log.conf.low  <dbl> -0.8947661, -0.4201151, -0.4567781, -0.5869119, -0.88192…
$ log.conf.high <dbl> -0.474609681, 0.314999396, 0.283958850, 0.343971392, 0.0…
$ estimate      <dbl> 0.5042476, 0.9487994, 0.9172184, 0.8856174, 0.6559699, 0…
$ conf.low      <dbl> 0.4087032, 0.6569712, 0.6333208, 0.5560418, 0.4139865, 0…
$ conf.high     <dbl> 0.6221278, 1.3702585, 1.3283783, 1.4105383, 1.0393974, 0…
$ p.value       <dbl> 1.681527e-10, 7.792783e-01, 6.474743e-01, 6.089951e-01, …

Step 1: Make point and line range section of the plot

We will first work on making the standard “forest plot”, or the middle section of the figure. This section uses points and lines to indicate the estimate and 95% confidence interval around the estimate.

In my experience, journal editors sometimes ask for these estimates to look a certain way during the revision process. For this graph, the journal editor told me that it was “journal standard to plot relative measures (ratio data), such as ORs, on log scales to preserve the correct spatial relationship between values.” So, I’m going to be visually showing the log hazard ratio, and annotating later with the hazard ratio.

Let’s look at how we can plot the log hazard ratio estimates. We first want the models to show in alphabetical order from the top to the bottom of the graph.

p <- 
  res |>
  ggplot(aes(y = fct_rev(model))) + 
  theme_classic()
p

Then we show all of our information (point estimate and 95% confidence interval) on the graph.

p <- p +
  geom_point(aes(x=log.estimate), shape=15, size=3) +
  geom_linerange(aes(xmin=log.conf.low, xmax=log.conf.high)) 
p

We can now add a vertical line at 0 with geom_vline and rename the x axis.

p <- p +
  geom_vline(xintercept = 0, linetype="dashed") +
  labs(x="Log Hazard Ratio", y="")
p

Next we’ll use coord_cartesian() which will allow us to zoom to the exact height and width we want. I want to zoom out a bit to leave myself room for the text “Corticosteroids protective” vs. “Corticosteroids harmful” so I’m going to set my limits to y=c(1,11). Each of the models (10 in total) is one unit, so this will give me one extra unit of space at the top of the plot. The x-limit I played around with a bit based upon the range of my log hazard ratios, and I ultimately arrived at xlim=c(-1, .5).

p <- p +
  coord_cartesian(ylim=c(1,11), xlim=c(-1, .5))
p

Now we have space to add our text about protective vs. harmful using the annotate layer.

p <- p +
  annotate("text", x = -.32, y = 11, label = "Corticosteroids protective") +
  annotate("text", x = .3, y = 11, label = "Corticosteroids harmful")
p

Finally, we will remove everything on the y axis, because this plot is going to align with the next plot we make, showing the hazard ratios.

p_mid <- p + 
  theme(axis.line.y = element_blank(),
        axis.ticks.y= element_blank(),
        axis.text.y= element_blank(),
        axis.title.y= element_blank())
p

We’ll save this ggplot object as p_mid and move on to the next section of the figure.

Step 2: Create estimate annotations plot

To plot the hazard ratio estimates, we first need to modify the data set a bit. We’ll start by rounding our estimates to the significant figures the journal requires. For this figure, I need two decimal places.

# wrangle results into pre-plotting table form
res_plot <- res |>
  mutate(across(
    c(estimate, conf.low, conf.high),
    ~ str_pad(
      round(.x, 2),
      width = 4,
      pad = "0",
      side = "right"
    )
  ),
  estimate_lab = paste0(estimate, " (", conf.low, "-", conf.high, ")")) |>
  mutate(p.value = case_when(
    p.value < .001 ~ "<0.001",
    round(p.value, 2) == .05 ~ as.character(round(p.value,3)),
    p.value < .01 ~ str_pad(
      as.character(round(p.value, 3)),
      width = 4,
      pad = "0",
      side = "right"
    ),
    TRUE ~ str_pad(
      as.character(round(p.value, 2)),
      width = 4,
      pad = "0",
      side = "right"
    )
  )) |>
  bind_rows(
    data.frame(
      model = "Model",
      estimate_lab = "Hazard Ratio (95% CI)",
      conf.low = "",
      conf.high = "",
      p.value = "p-value"
    )
  ) |>
  mutate(model = fct_rev(fct_relevel(model, "Model")))

glimpse(res_plot)
Rows: 11
Columns: 9
$ model         <fct> A, B, C, D, E, F, G, H, I, J, Model
$ log.estimate  <dbl> -0.68468788, -0.05255784, -0.08640963, -0.12147024, -0.4…
$ log.conf.low  <dbl> -0.8947661, -0.4201151, -0.4567781, -0.5869119, -0.88192…
$ log.conf.high <dbl> -0.474609681, 0.314999396, 0.283958850, 0.343971392, 0.0…
$ estimate      <chr> "0.50", "0.95", "0.92", "0.89", "0.66", "0.77", "1.05", …
$ conf.low      <chr> "0.41", "0.66", "0.63", "0.56", "0.41", "0.60", "0.77", …
$ conf.high     <chr> "0.62", "1.37", "1.33", "1.41", "1.04", "0.99", "1.45", …
$ p.value       <chr> "<0.001", "0.78", "0.65", "0.61", "0.07", "0.045", "0.75…
$ estimate_lab  <chr> "0.50 (0.41-0.62)", "0.95 (0.66-1.37)", "0.92 (0.63-1.33…

Creating the hazard ratios isn’t very difficult. We’ll take the modified data frame and again organize the model order on the y axis. Note that this time, I modified model within the data frame, which we could have done for the middle plot instead of modifying it within the aesthetic argument.

p_left <-
  res_plot  |>
  ggplot(aes(y = model))
p_left

Next, we will add the model as text (instead of as a label on the y axis) using geom_text.

p_left <- 
  p_left +
  geom_text(aes(x = 0, label = model), hjust = 0, fontface = "bold")
p_left

We can use the same idea to add the hazard ratios and their confidence intervals.

p_left <- 
  p_left +
  geom_text(
    aes(x = 1, label = estimate_lab),
    hjust = 0,
    fontface = ifelse(res_plot$estimate_lab == "Hazard Ratio (95% CI)", "bold", "plain")
  )

p_left

Finally we can remove the background and edit the sizing so that this left size of the plot will match up neatly with the middle and right sides of the plot.

p_left <-
  p_left +
  theme_void() +
  coord_cartesian(xlim = c(0, 4))

p_left

Step 3: Create p-value annotations

# right side of plot - pvalues
p_right <-
  res_plot  |>
  ggplot() +
  geom_text(
    aes(x = 0, y = model, label = p.value),
    hjust = 0,
    fontface = ifelse(res_plot$p.value == "p-value", "bold", "plain")
  ) +
  theme_void() 

p_right

Step 4: Put the three plots together with patchwork

layout <- c(
  area(t = 0, l = 0, b = 30, r = 3),
  area(t = 1, l = 4, b = 30, r = 9),
  area(t = 0, l = 9, b = 30, r = 11)
)
# final plot arrangement
p_left + p_mid + p_right + plot_layout(design = layout)

Step 5: Export your plot!

ggsave("forest-plot.eps", width=9, height=4)

Just the code

## load up the packages we will need: 
library(tidyverse)
library(gt)
library(patchwork)
## ---------------------------
## load data
# load in results generated from Cox PH hazards models
res_log <- read_csv("model-first-results-log.csv")
res <- read_csv("model-first-results.csv")
res <- res_log |>
      rename_with(~str_c("log.", .), estimate:conf.high) |>
  select(-p.value) |>
  full_join(res)

## plotting
## ---------------------------
# create forest plot on log scale (middle section of figure)
p_mid <-
  res |>
  ggplot(aes(y = fct_rev(model))) +
  theme_classic() +
  geom_point(aes(x=log.estimate), shape=15, size=3) +
  geom_linerange(aes(xmin=log.conf.low, xmax=log.conf.high)) +
  labs(x="Log Hazard Ratio") +
  coord_cartesian(ylim=c(1,11), xlim=c(-1, .5))+
  geom_vline(xintercept = 0, linetype="dashed") +
  annotate("text", x = -.32, y = 11, label = "Corticosteroids protective") +
  annotate("text", x = .3, y = 11, label = "Corticosteroids harmful") +
  theme(axis.line.y = element_blank(),
        axis.ticks.y= element_blank(),
        axis.text.y= element_blank(),
        axis.title.y= element_blank())
# wrangle results into pre-plotting table form
res_plot <- res |>
  mutate(across(c(estimate, conf.low, conf.high), ~str_pad(round(.x, 2), width=4, pad="0", side="right")),
         estimate_lab = paste0(estimate, " (", conf.low, "-", conf.high,")"),
         color = rep(c("gray","white"),5)) |>
  mutate(p.value = case_when(p.value < .01 ~ "<0.01", TRUE ~ str_pad(as.character(round(p.value, 2)),width=4,pad="0",side="right"))) |>
  bind_rows(data.frame(model = "Model", estimate_lab = "Hazard Ratio (95% CI)", conf.low = "", conf.high="",p.value="p-value")) |>
  mutate(model = fct_rev(fct_relevel(model, "Model")))
# left side of plot - hazard ratios
p_left <-
  res_plot  |>
  ggplot(aes(y = model)) + 
  geom_text(aes(x=0, label=model), hjust=0, fontface = "bold") +
  geom_text(aes(x=1, label=estimate_lab), hjust=0, fontface = ifelse(res_plot$estimate_lab == "Hazard Ratio (95% CI)", "bold", "plain")) +
  theme_void() +
  coord_cartesian(xlim=c(0,4))
# right side of plot - pvalues
p_right <-
  res_plot  |>
  ggplot() +
  geom_text(aes(x=0, y=model, label=p.value), hjust=0, fontface = ifelse(res_plot$p.value == "p-value", "bold", "plain")) +
  theme_void() 
# layout design (top, left, bottom, right)
layout <- c(
  area(t = 0, l = 0, b = 30, r = 3),
  area(t = 1, l = 4, b = 30, r = 9),
  area(t = 0, l = 9, b = 30, r = 11)
)
# final plot arrangement
p_left + p_mid + p_right + plot_layout(design = layout)
## save final figure
#ggsave("forest-plot.eps", width=9, height=4)

Session Info

sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] patchwork_1.1.1 gt_0.6.0        forcats_0.5.1   stringr_1.4.1  
 [5] dplyr_1.0.9     purrr_0.3.4     readr_2.1.2     tidyr_1.2.0    
 [9] tibble_3.1.8    ggplot2_3.3.6   tidyverse_1.3.1

loaded via a namespace (and not attached):
 [1] lubridate_1.8.0   assertthat_0.2.1  digest_0.6.29     utf8_1.2.2       
 [5] R6_2.5.1          cellranger_1.1.0  backports_1.4.1   reprex_2.0.1     
 [9] evaluate_0.15     httr_1.4.2        pillar_1.8.1      rlang_1.0.4      
[13] curl_4.3.2        readxl_1.4.0      rstudioapi_0.13   rmarkdown_2.13   
[17] labeling_0.4.2    htmlwidgets_1.5.4 bit_4.0.4         munsell_0.5.0    
[21] broom_0.8.0       compiler_4.1.3    modelr_0.1.8      xfun_0.32        
[25] pkgconfig_2.0.3   htmltools_0.5.2   tidyselect_1.1.2  fansi_1.0.3      
[29] crayon_1.5.1      tzdb_0.3.0        dbplyr_2.1.1      withr_2.5.0      
[33] cabinets_0.6.0    grid_4.1.3        jsonlite_1.8.0    gtable_0.3.0     
[37] lifecycle_1.0.1   DBI_1.1.2         magrittr_2.0.3    scales_1.2.1     
[41] cli_3.4.1         stringi_1.7.8     vroom_1.5.7       farver_2.1.1     
[45] fs_1.5.2          xml2_1.3.3        ellipsis_0.3.2    generics_0.1.3   
[49] vctrs_0.4.1       tools_4.1.3       bit64_4.0.5       glue_1.6.2       
[53] hms_1.1.1         parallel_4.1.3    fastmap_1.1.0     yaml_2.3.5       
[57] colorspace_2.0-3  rvest_1.0.2       knitr_1.38        haven_2.5.0