In early May I attended the New York R Conference. There were 24 speakers, including my coworker at Weill Cornell Medicine, Elizabeth Sweeney! Each person did a 20-minute presentation on some way they use R for their work and/or hobbies. There was a ton of information, and even though not all of it was directly useful for my workflow as a statistical consultant in an academic setting, I really enjoyed being around so many people who love R.
I’ve linked some videos of my favorite talks and put together some the topics/packages/functions I found most intriguing or useful in my day-to-day work as a research biostatistician. (This was originally a presentation for my biostatistics team’s computing club.)
Visualizing data with
Brooke Watson, a data scientist at the American Civil Liberties Union, gave a great presentation on how she uses R to defend immigrants. She shared several data wrangling tips. One new function for me was
naniar::vis_miss() to visualize your missing data quickly.
#install.packages("tidyverse") #install.packages("naniar") library(tidyverse) library(naniar) vis_miss(airquality) # a base R data set
It returns a
ggplot2 object so you can edit titles, colors, etc. if necessary. You can also add various sorting and clustering arguments to make it easier to see patterns of missingness in your data.
Checking out data differences with
Brooke also gave a demo for a neat package to check if and where differences in two data sets are occurring.
#install.packages("daff) library(daff) dat1 <- data.frame(A = c(1:3), B = c(T,F,T)) dat2 <- data.frame(A = c(1:4), C = c("apple",NA,NA,"banana")) my_diff <- diff_data(dat1, dat2) my_diff
## Daff Comparison: 'dat1' vs. 'dat2' ## +++ --- ## @@ A C B ## + 1 apple TRUE ## 2 <NA> FALSE ## 3 <NA> TRUE ## +++ 4 banana <NA>
I thought this would be useful for when you receive new data sets and want to make sure column names, patients, etc. haven’t changed. Check out the full documentation here.
Noam Ross shared code for editable figures using David Gohel’s
rvg packages. I shared some example code for my team on github after I saw him present it at an R-Ladies event in the fall. Essentially you can run some pretty simple lines of code to output figures (base R,
ggplot2, or otherwise) as editable figures in Powerpoint. Noam reminded us that whoever you give these figures to will now be able to edit anything, even data points, so keep that in mind before you freely give away editable figures… :)
#install.packages("rvg") #install.packages("officer") library(rvg) library(officer) #sample data dat <- data.frame(x = rnorm(100, 10), y = rnorm(100, 100), z = rnorm(100, 1)) #make an empty ppt read_pptx() %>% #add a slide, must specify the slide layout and layout name add_slide(layout="Title and Content", master="Office Theme") %>% #specify what you want on the slide (code = ...) #type="body" means the plots going in the body part of the layout #width and height are in inches ph_with_vg(code = plot(dat$x, dat$y, main="Edit me!", pch=16), type="body", width=6, height=4) %>% #output your ppt (target argument is just the file destination/name) print(target="plot.pptx")
##  "/Users/katherinehoffman/Desktop/basic/content/blog/nyrconf/plot.pptx"
Going from RMarkdown to Word, and back again with
Noam also shared his new package,
redoc, which allows you to reload an Rmd-generated word file back into R as a modified Rmd file.
This is part of his goal to decrease the pain of “the valley of heartbreak.”
Installation command is:
You may need to update several packages to get it to run correctly, but after that the main commands are just
dedoc. To see for yourself, try running my github code, making some changes to your word doc, and reloading back into Rmarkdown with the
This could definitely be an entire computing club presentation… but for long projects that you have to redo with new data often,
drake is becoming really popular. Amanda Dobbyn gave an awesome presentation and you can see her slides here.
A super informative bookdown guide by the authors can be found here. Essentially their motto is “what gets done stays done” so that you are not redoing work you’ve already done in order to update your results. Yet, you’re still redoing what needs to be done in a reproducible way!
Git merge conflicts
I went to a whole-day workshop on Git so if you’re interested in talking more about this let me know. BUT the biggest thing I learned was that if you are ever using Git and find your code has strange characters like >>>>>>>> HEAD followed by ======== and a long set of letters/numbers, this means you have a merge conflict. It’s meant to be a flag so you know where to fix the differences in your two files you’re trying to version control! I spent days struggling with this problem before, so I thought I’d pass the message along in case anyone runs into it someday. :)
Talks to check out
Some of my favorite talks from the conference were…
This was not from the New York R Conference but I saw it on Twitter while making this presentation for my computing club and I really enjoyed it…
#install.packages("genius") library(genius) genius_lyrics("the beatles", "hey jude")
## # A tibble: 53 x 3 ## track_title line lyric ## <chr> <int> <chr> ## 1 Hey Jude 1 Hey Jude, don't make it bad ## 2 Hey Jude 2 Take a sad song and make it better ## 3 Hey Jude 3 Remember to let her into your heart ## 4 Hey Jude 4 Then you can start to make it better ## 5 Hey Jude 5 Hey Jude, don't be afraid ## 6 Hey Jude 6 You were made to go out and get her ## 7 Hey Jude 7 The minute you let her under your skin ## 8 Hey Jude 8 Then you begin to make it better ## 9 Hey Jude 9 And anytime you feel the pain, hey Jude, refrain ## 10 Hey Jude 10 Don't carry the world upon your shoulders ## # … with 43 more rows