Using ggplot2 to create Treatment Timelines with Multiple Variables

This post walks through code to create a timeline in R using ggplot2. These types of plots can help visualize treatment or measurement patterns, time-varying covariates, outcomes, and loss to follow-up in longitudinal data settings.

Skip the step-by-step, just the code, please!

Background

Treatment timelines, or “swimmer plots”, are a visualization technique I find useful in exploring longitudinal data structures. A few years ago I shared how I make treatment timelines for a single treatment (categorical or continuous) in the post Patient Treatment Timelines for Longitudinal Survival Data.

Sometimes when I share these plots with collaborators, they ask me to add additional variables to the timelines. This post shows how to do that.

I’ll use a toy dataset on hospitalized COVID-19 patients, available to download on this Github repository. It is derived from a dataset from Electronic Health Record data during Spring 2020. This is a time period when there was large variation in provider practice in administering steroids, a type of drug that combats hyper-inflammation. Steroids are usually given to patients which exhibit an inflammatory profile; we will identify this using a threshold for low oxygen levels (severe hypoxia).

We will look at the treatment patterns of steroids as it relates to the timing of patients (1) reaching severe hypoxia and (2) being put on a ventilator (intubation). We will also include whether patients died. I used a similar figure in a recent manuscript on this topic, if you’re interested in learning more!

Exploring the data

The data set is in long format with one row per patient. Let’s first load the data set and libraries we’ll need, then look at the first 20 rows:

# install.packages(c("tidyverse","gt","RCurl"))
library(tidyverse)
library(gt)

dat_long <- read_csv("https://raw.githubusercontent.com/kathoffman/steroids-trial-emulation/main/data/dat_trt_timeline.csv", col_types = list(id  = "c", steroids = "c", death = "c", severe = "c"))

If we look at the first patient (id = 797), we can see they were in the hospital for 17 days, never intubated, never receive steroids, and ultimately die (death is 1 on the last day).

head(dat_long, n=20) |>
  gt()
id day intubation_status steroids death severe
797 0 Not intubated 0 0 0
797 1 Not intubated 0 0 0
797 2 Not intubated 0 0 1
797 3 Not intubated 0 0 0
797 4 Not intubated 0 0 0
797 5 Not intubated 0 0 0
797 6 Not intubated 0 0 0
797 7 Not intubated 0 0 0
797 8 Not intubated 0 0 0
797 9 Not intubated 0 0 0
797 10 Not intubated 0 0 0
797 11 Not intubated 0 0 0
797 12 Not intubated 0 0 0
797 13 Not intubated 0 0 0
797 14 Not intubated 0 0 0
797 15 Not intubated 0 0 0
797 16 Not intubated 0 1 0
285 0 Not intubated 0 0 0
285 1 Not intubated 0 0 0
285 2 Not intubated 0 0 0

We can plot all patients’ hospital length of stay, colored by intubation status using ggplot2’s geom_line():

dat_long |>
  ggplot(aes(x=day, y=id, col = intubation_status, group=id)) +
  geom_line() +
  theme_bw()

We can add our steroids column to the plot by adding a point designating whether steroids exposure was 1 (yes) or 0 (no) that day. We can see this results in points of two different colors on the lines of our plot. This can work just fine! …unless you want to add another variable to the timeline.

dat_long |>
  ggplot(aes(x=day, y=id, col = intubation_status, group=id)) +
  geom_point(aes(day, id, col = steroids)) +
  geom_line() +
  theme_bw() 

Modify the data

We could edit the colors of the dots we don’t want so that they’re transparent (using NA), but when you have other non-mutually exclusive dots you want to show, it’s simpler to just edit the data instead. So, we will now edit our data so that our three binary columns are turned into three *_this_day column, where:

  • The value is NA if the observation did not experience that exposure/outcome that day (remember each day is a new row)

  • The value is the day if the observation did experience the exposure/outcome. This is to make our x axis easy to specify in ggplot2.

dat_swim <-
  dat_long |>
  mutate(severe_this_day = case_when(severe == 1 ~ day),
         steroids_this_day = case_when(steroids == 1 ~ day),
         death_this_day = case_when(death == 1 ~ day))

While we’re at it, let’s modify the patient’s IDs so that we can rearrange our plot by length of each individual’s timeline. To do this, we will refactor the id variable by a new variable id_sorted.

There are probably other ways to do this, but this is what I do. I first make a new variable called max_day that is the maximum day that the patient is in the study. I then nest() the data by id (by calling all the other columns).

I then make my new column using row_number(), and unnest() the data, which gives me clean IDs in a sorted fashion:

dat_swim <- 
  dat_swim |>
  group_by(id) |>
  mutate(max_day = max(day)) |>
  ungroup() |>
  nest(cols = day:death_this_day) |>
  arrange(max_day) |>
  mutate(id_sorted = factor(row_number())) |>
  unnest()
## Warning: `cols` is now required when using unnest().
## Please use `cols = c(cols)`
head(dat_swim) |> gt()
id max_day day intubation_status steroids death severe severe_this_day steroids_this_day death_this_day id_sorted
767 1 0 Not intubated 0 0 0 NA NA NA 1
767 1 1 Not intubated 0 0 0 NA NA NA 1
156 1 0 Not intubated 0 0 0 NA NA NA 2
156 1 1 Not intubated 0 1 0 NA NA 1 2
804 2 0 Not intubated 0 0 0 NA NA NA 3
804 2 1 Not intubated 0 0 0 NA NA NA 3

Back to plotting

Now, we can re-plot the steroids and intubation statuses using our new data. This time without all the 0 values for steroids showing.

dat_swim |> 
  ggplot() +
  geom_line(aes(x=day, y=id_sorted, col = intubation_status, group=id)) +
  geom_point(aes(x=steroids_this_day, y=id_sorted, col="Steroids", shape="Steroids")) +
  theme_bw()
## Warning: Removed 387 rows containing missing values (geom_point).

From this point on I’ll save the plot as p and just keep adding onto it so you can see the new step.

We’ll see why we’re doing this in a second, but in creating this first iteration of p using geom_line() and geom_point(), we also want to set the col and shape to match how we want the marker for steroids to appear in the legend. I’m also going to make minor edits to the size of the point geometry right now, as well as the width of each timeline itself (also using the size argument). Lastly, we are now going to use id_sorted.

p <- 
  dat_swim |> 
  ggplot() +
  geom_line(aes(x=day, y=id_sorted, col = intubation_status, group=id_sorted),
            size=1.8) +
  geom_point(aes(x=steroids_this_day, y=id_sorted, col="Steroids", shape="Steroids"), stroke=2) +
  theme_bw()
p
## Warning: Removed 387 rows containing missing values (geom_point).

Let’s add hypoxia and death to the figure. We’ll use geom_point() again, and again specify legend names for the col and shape arguments, and modify the size and stroke of our point geometries.

p <- p +
  geom_point(aes(x=severe_this_day, y=id_sorted, col="Severe hypoxia", shape="Severe hypoxia"), size=2, stroke=1.5) +
  geom_point(aes(x=death_this_day, y=id_sorted, col="Death", shape="Death"), size=2, stroke=1.5) 
p
## Warning: Removed 387 rows containing missing values (geom_point).
## Warning: Removed 403 rows containing missing values (geom_point).
## Warning: Removed 414 rows containing missing values (geom_point).

Note that we get warning messages that values with NA are removed. This is fine since we just created all those NAs! I’m going to set my options so that warnings are suppressed for future code outputs to keep this post tidy.

knitr::opts_chunk$set(message=F, warning=F)

Modify the colors and shapes

Next let’s start changing our color and shape scales. We can change colors using scale_color_manual() and and filling in the values argument with a vector where the names of the vector match the names in the col in our geom_point() aesthetics.

I define my cols in a vector outside the plotting code to keep everything cleaner. Note that the order we’re specifying here will continue throughout the rest of the plotting code!

# define colors for all geometries with a color argument
cols <- c("Severe hypoxia" = "#b24745", # red
          "Intubated" = "darkslateblue", # navy
          "Not intubated" = "#74aaff", # lighter blue
          "Steroids"="#ffd966", # gold
          "Death" = "black") 

After we set values = cols, the name argument is simply the title we want for our legend (I chose “Patient Status”).

p <- p +
  scale_color_manual(values = cols, name="Patient Status") 
p

We can do the same to change shapes with scale_shape_manual(). If you don’t want a shape for a certain status (e.g. intubation status), just leave it out of your vector.

# define shapes for all geometries with a shape argument (geom_point)
shapes <- c("Severe hypoxia" = 21,
            "Steroids" = 15, # square
            "Death" = 4) # cross # empty circle (control inside with fill argument if desired)

p <- p + scale_shape_manual(values = shapes, name = "Patient Status")
p

Fix the Legend

You’ll notice that our legend does not match the changes we made to the colors and shapes right now. This is because we’re stretching (possibly breaking) the grammar of graphics (the “gg” of ggplot2) rules by assigning different colors and shapes across variables. Although ggplot2 wasn’t designed to do what we’re doing, we can override the legend aesthetics and still create a plot that shows correct and useful information.

We will do this by using the guides() function. We can control each aesthetic here. We will first override the colors legend with the code guide_legend(override.aes = list(...)).

This allows us to change the shapes of the color legend by specifying a vector with the shapes we want in the order the labels appear in the legend. If we don’t want a shape to appear on the legend, we will use NA.

I only want to show shapes for certain statuses (severe hypoxia, steroid administration, death), and not the intubation status of a patient, so I’ll set up my shape override vector accordingly. Note that the order in the legend follows the order of my color specification vector (cols).

shape_override <- c(21, NA, NA, 15, 4) # order matches `cols`:severe, intubation (yes/no), steroids, death

p +
  guides(color = guide_legend(
                      override.aes = list(
                          shape = shape_override) # modify the color legend to include shapes
                      )
         ) 

To remove the line through Death, Severe hypoxia, and Steroids in our legend, we can override the aesthetics for linetype with NA’s for those three labels. We will specify the default, linetype=1, for our intubation status color labels.

We can additionally override the stroke and size arguments to correspond to our point geometries.

line_override <- c(NA,1,1,NA,NA) # order matches `cols`:severe, intubation (yes/no), steroids, death
stroke_override <- c(.8,1,1,1,1) # order matches `cols`:severe, intubation (yes/no), steroids, death
size_override <- c(2.5,2.5,2.6,2,2) # order matches `cols`:severe, intubation (yes/no), steroids, death

p +  guides(color = guide_legend(
                        override.aes = list(
                                shape = shape_override,
                                linetype = line_override)
                                )
             )

Now that our color legend contains all necessary information, we can remove the shape legend using the argument shape = "none" in guides().

p <- p + 
  guides(color = guide_legend(
                        override.aes = list(
                                shape = shape_override,
                                linetype = line_override)
                                ),
         shape = "none"
             )

p

Ok, the challenging parts are done! Now we can make some minor aesthetic edits using labs, scale_x_continuous(), and theme(). I won’t go into detail on these edits because they’re fairly self-explanatory, but check out the help files if you’re unsure what these arguments in theme do!

p <- p +
  labs(x="Days since hospitalization",y="Patient\nnumber",title="Treatment Timeline for N=30 Patients") +
  scale_x_continuous(expand=c(0,0)) + # remove extra white space 
  theme(text=element_text(family="Poppins", size=11),
        title = element_text(angle = 0, vjust=.5, size=12, face="bold"),
        axis.title.y = element_text(angle = 0, vjust=.5, size=12, face="bold"),
        axis.title.x = element_text(size=15, face="bold", vjust=-0.5, hjust=0),
        axis.text.y = element_text(size=6, hjust=1.5),
        axis.ticks.y = element_blank(),
        legend.position = c(0.8, 0.3),
        legend.title = element_text(colour="black", size=13, face=4),
        legend.text = element_text(colour="black", size=10),
        legend.background = element_rect(size=0.5, linetype="solid", colour ="gray30"),
        panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank()
  ) 
p

Hope this is helpful! As always let me know if you have any feedback or suggestions. If you’d like to copy-paste the code, here it is:

Just the Code

library(tidyverse)
library(gt)

dat_long <- read_csv("https://raw.githubusercontent.com/kathoffman/steroids-trial-emulation/main/data/dat_trt_timeline.csv", col_types = list(id  = "c", steroids = "c", death = "c", severe = "c"))

# define colors for all geometries with a color argument
cols <- c("Severe hypoxia" = "#b24745", # red
          "Intubated" = "darkslateblue", # navy
          "Not intubated" = "#74aaff", # lighter blue
          "Steroids"="#ffd966", # gold
          "Death" = "black") 
shapes <- c("Severe hypoxia" = 21,
            "Steroids" = 15, # square
            "Death" = 4) # cross # empty circle (control inside with fill argument if desired)

shape_override <- c(21, NA, NA, 15, 4) # order matches `cols`:severe, intubation (yes/no), steroids, death
line_override <- c(NA,1,1,NA,NA) # order matches `cols`:severe, intubation (yes/no), steroids, death
stroke_override <- c(.8,1,1,1,1) # order matches `cols`:severe, intubation (yes/no), steroids, death
size_override <- c(2.5,2.5,2.6,2,2) # order matches `cols`:severe, intubation (yes/no), steroids, death

# modify swimmer data to 1) only show events if yes 2) have an id ordered by max follow up
dat_swim <- 
   dat_long |>
  mutate(severe_this_day = case_when(severe == 1 ~ day),
         steroids_this_day = case_when(steroids == 1 ~ day),
         death_this_day = case_when(death == 1 ~ day)) %>%
  group_by(id) |>
  mutate(max_day = max(day)) |>
  ungroup() |>
  nest(cols = day:death_this_day) |>
  arrange(max_day) |>
  mutate(id_sorted = factor(row_number())) |>
  unnest()

dat_swim |>
  ggplot() +
  geom_line(aes(x=day, y=id_sorted, col = intubation_status, group=id_sorted),
            size=1.8) +
  geom_point(aes(x=steroids_this_day, y=id_sorted, col="Steroids", shape="Steroids"), stroke=2) +
  geom_point(aes(x=severe_this_day, y=id_sorted, col="Severe hypoxia", shape="Severe hypoxia"), size=2, stroke=1.5) +
  geom_point(aes(x=death_this_day, y=id_sorted, col="Death", shape="Death"), size=2, stroke=1.5) +
  theme_bw() +
  scale_color_manual(values = cols, name="Patient Status") +
  scale_shape_manual(values = shapes, name = "Patient Status") +
  guides(color = guide_legend(
                        override.aes = list(
                                shape = shape_override,
                                linetype = line_override)
                                ),
         shape = "none"
             )+
  labs(x="Days since hospitalization",y="Patient\nnumber",title="Treatment Timeline for N=30 Patients") +
  scale_x_continuous(expand=c(0,0)) + # remove extra white space 
  theme(text=element_text(family="Poppins", size=11),
        title = element_text(angle = 0, vjust=.5, size=12, face="bold"),
        axis.title.y = element_text(angle = 0, vjust=.5, size=12, face="bold"),
        axis.title.x = element_text(size=15, face="bold", vjust=-0.5, hjust=0),
        axis.text.y = element_text(size=6, hjust=1.5),
        axis.ticks.y = element_blank(),
        legend.position = c(0.8, 0.3),
        legend.title = element_text(colour="black", size=13, face=4),
        legend.text = element_text(colour="black", size=10),
        legend.background = element_rect(size=0.5, linetype="solid", colour ="gray30"),
        panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank()
  ) 

Session Info

sessionInfo()

R version 4.1.3 (2022-03-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7

Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] gt_0.6.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9
[5] purrr_0.3.4 readr_2.1.2 tidyr_1.2.0 tibble_3.1.6
[9] ggplot2_3.3.5 tidyverse_1.3.1

loaded via a namespace (and not attached): [1] lubridate_1.8.0 here_1.0.1 assertthat_0.2.1 rprojroot_2.0.3 [5] digest_0.6.29 utf8_1.2.2 R6_2.5.1 cellranger_1.1.0 [9] backports_1.4.1 reprex_2.0.1 evaluate_0.15 highr_0.9
[13] httr_1.4.2 blogdown_1.9 pillar_1.7.0 rlang_1.0.2
[17] curl_4.3.2 readxl_1.4.0 rstudioapi_0.13 jquerylib_0.1.4 [21] checkmate_2.0.0 rmarkdown_2.13 labeling_0.4.2 bit_4.0.4
[25] munsell_0.5.0 broom_0.8.0 compiler_4.1.3 modelr_0.1.8
[29] xfun_0.30 pkgconfig_2.0.3 htmltools_0.5.2 tidyselect_1.1.2 [33] bookdown_0.26 fansi_1.0.3 crayon_1.5.1 tzdb_0.3.0
[37] dbplyr_2.1.1 withr_2.5.0 cabinets_0.6.0 grid_4.1.3
[41] jsonlite_1.8.0 gtable_0.3.0 lifecycle_1.0.1 DBI_1.1.2
[45] magrittr_2.0.3 scales_1.2.0 cli_3.3.0 stringi_1.7.6
[49] vroom_1.5.7 farver_2.1.0 fs_1.5.2 xml2_1.3.3
[53] bslib_0.3.1 ellipsis_0.3.2 generics_0.1.2 vctrs_0.4.1
[57] tools_4.1.3 bit64_4.0.5 glue_1.6.2 hms_1.1.1
[61] parallel_4.1.3 fastmap_1.1.0 yaml_2.3.5 colorspace_2.0-3 [65] rvest_1.0.2 knitr_1.38 haven_2.5.0 sass_0.4.1

Avatar
Katherine Hoffman
Research Biostatistician

I am passionate about meaningful, reproducible medical research.

Related