Become a Superlearner! An Illustrated Guide to Superlearning

Why use one machine learning algorithm when you could use all of them?! This post contains a step-by-step walkthrough of how to build a superlearner prediction algorithm in R. HTML Image as link A Visual Guide… Over the winter, I read Targeted Learning by Mark van der Laan and Sherri Rose. This “visual guide” I made for Chapter 3: Superlearning by Rose, van der Laan, and Eric Polley is a condensed version of the following tutorial.

Customizable correlation plots in R

TL;DR If you’re ever felt limited by correlogram packages in R, this post will show you how to write your own function to tidy the many correlations into a ggplot2-friendly form for plotting. By the end, you will be able to run one function to get a tidied data frame of correlations: formatted_cors(mtcars) %>% head() %>% kable() measure1 measure2 r n p sig_p p_if_sig r_if_sig mpg mpg 1.

Tips and Tricks from the New York R Conference

In early May I attended the New York R Conference. There were 24 speakers, including my coworker at Weill Cornell Medicine, Elizabeth Sweeney! Each person did a 20-minute presentation on some way they use R for their work and/or hobbies. There was a ton of information, and even though not all of it was directly useful for my workflow as a statistical consultant in an academic setting, I really enjoyed being around so many people who love R.

A Day in the Life of a Biostatistician

It seems fitting that my first blog post is on a topic that I tried and failed to find via Google search a few years ago. I’ll back up for a second. A few years ago I was a recent college graduate, and trying hard to “figure out my life.” My major was biochemistry, which is one of those degrees where 99%* of people just keep on going to school.

Data Wrangling with dplyr

A Presentation for Weill Cornell Medicine’s Biostatistics Computing Club Image courtesy of Allison Horst’s Twitter: @allison_horst Introduction Why dplyr? Powerful but efficient Consistent syntax Fast Function chaining Works well with entire tidyverse suite Efficiency* Simple syntax Function chaining Ability to analyze external databases Works well with other packages in tidyverse suite ggplot2 tidyr stringr forcats purrr *if you start dealing with data sets with > 1 million rows, data.