Why use one machine learning algorithm when you could use all of them?! This post contains a step-by-step walkthrough of how to build a superlearner prediction algorithm in R.
HTML Image as link A Visual Guide… Over the winter, I read Targeted Learning by Mark van der Laan and Sherri Rose. This “visual guide” I made for Chapter 3: Superlearning by Rose, van der Laan, and Eric Polley is a condensed version of the following tutorial.
In non-coronavirus times, I am the biostatistician for a team of NYC pulmonologists and intensivists. When the pandemic hit NYC in mid-March, I immediately became a 100% 200% COVID-19 statistician. I received many analysis requests, though not all of them from official investigators:
My family recently learned I am the statistician for my hospital’s pulmonologists and now I get COVID-19 analysis requests from them, too pic.twitter.com/wlHmUaBh6Y — Kat Hoffman (@rkatlady) April 10, 2020 Jokes aside, I was really, really busy during the outbreak.
TL;DR If you’re ever felt limited by correlogram packages in R, this post will show you how to write your own function to tidy the many correlations into a ggplot2-friendly form for plotting.
By the end, you will be able to run one function to get a tidied data frame of correlations:
formatted_cors(mtcars) %>% head() %>% kable() measure1 measure2 r n p sig_p p_if_sig r_if_sig mpg mpg 1.
TL;DR You can a regress an outcome on a grouping variable plus any other variable(s) and the unadjusted and adjusted group means will be identical.
We can see this in a simple example using the palmerpenguins data:
#remotes::install_github("allisonhorst/palmerpenguins") library(palmerpenguins) library(tidyverse) library(gt) # use complete cases for simplicity penguins <- drop_na(penguins) penguins %>% # fit a linear regression for bill length given bill depth and species # make a new column containing the fitted values for bill length mutate(preds = predict(lm(bill_length_mm ~ bill_depth_mm + species, data = .
I am a biostatistician at a research university, and I often find myself working with longitudinal survival data. As with any data analysis, I need to examine the quality of my data before deciding which statistical methods to implement.
This post contains reproducible examples for how I prefer to visually explore survival data containing longitudinal exposures or covariates. I create a “treatment timeline” for each patient, and the end product looks something like this:
Background In September 2019, I gave an R-Ladies NYC presentation about using the package sl3 to implement the superlearner algorithm for predictions. You can download the slides for it here. This post is a modification to the original demo I gave.
For a better background on what the superlearner algorithm is, please see my more recent blog post.
Step 0: Load your libraries, set a seed, and load the data You’ll likely need to install sl3 from the tlverse github page, as it was not yet on CRAN at the time of writing this post.
It seems fitting that my first blog post is on a topic that I tried and failed to find via Google search a few years ago.
I’ll back up for a second. A few years ago I was a recent college graduate, and trying hard to “figure out my life.” My major was biochemistry, which is one of those degrees where 99%* of people just keep on going to school.