R

An Illustrated Guide to TMLE, Part III: Properties, Theory, and Learning More

The is the third and final post in a three-part series to help beginners and/or visual learners understand Targeted Maximum Likelihood Estimation (TMLE). In this section, I discuss more statistical properties of TMLE, offer a brief explanation for the theory behind TMLE, and provide resources for learning more. Properties of TMLE 📈 To reiterate a point from Parts I and II, a main motivation for TMLE is that it allows the use of machine learning algorithms while still yielding asymptotic properties for inference.

An Illustrated Guide to TMLE, Part II: The Algorithm

The second post of a three-part series to help beginners and/or visual learners understand Targeted Maximum Likelihood Estimation (TMLE). This section walks through the TMLE algorithm for the mean difference in outcomes for a binary treatment and binary outcome. This post is an expansion of a printable “visual guide” available on my Github. I hope it helps analysts who feel out-of-practice reading mathematical notation follow along with the TMLE algorithm.

An Illustrated Guide to TMLE, Part I: Introduction and Motivation

The introductory post of a three-part series to help beginners and/or visual learners understand Targeted Maximum Likelihood Estimation (TMLE). This section contains a brief overview of the targeted learning framework and motivation for semiparametric estimation methods for inference, including causal inference. Table of Contents This blog post series has three parts: Part I: Motivation TMLE in three sentences 🎯 An Analyst’s Motivation for Learning TMLE 👩🏼‍💻 Is TMLE Causal Inference?

Become a Superlearner! An Illustrated Guide to Superlearning

Why use one machine learning algorithm when you could use all of them?! This post contains a step-by-step walkthrough of how to build a superlearner prediction algorithm in R. HTML Image as link A Visual Guide… Over the winter, I read Targeted Learning by Mark van der Laan and Sherri Rose. This “visual guide” I made for Chapter 3: Superlearning by Rose, van der Laan, and Eric Polley is a condensed version of the following tutorial.

Customizable correlation plots in R

TL;DR If you’re ever felt limited by correlogram packages in R, this post will show you how to write your own function to tidy the many correlations into a ggplot2-friendly form for plotting. By the end, you will be able to run one function to get a tidied data frame of correlations: formatted_cors(mtcars) %>% head() %>% kable() measure1 measure2 r n p sig_p p_if_sig r_if_sig mpg mpg 1.

Rethinking Conditional and Iterated Expectations with Linear Regression Models

An “aha!” moment: the day I realized I should rethink all the probability theorems using linear regressions. TL;DR You can a regress an outcome on a grouping variable plus any other variable(s) and the unadjusted and adjusted group means will be identical. We can see this in a simple example using the palmerpenguins data: #remotes::install_github("allisonhorst/palmerpenguins") library(palmerpenguins) library(tidyverse) library(gt) # use complete cases for simplicity penguins <- drop_na(penguins) penguins %>% # fit a linear regression for bill length given bill depth and species # make a new column containing the fitted values for bill length mutate(preds = predict(lm(bill_length_mm ~ bill_depth_mm + species, data = .

Lessons learned: my top five coding 'tricks' during the NYC COVID-19 outbreak

In non-coronavirus times, I am the biostatistician for a team of NYC pulmonologists and intensivists. When the pandemic hit NYC in mid-March, I immediately became a 100% 200% COVID-19 statistician. I received many analysis requests, though not all of them from official investigators: My family recently learned I am the statistician for my hospital’s pulmonologists and now I get COVID-19 analysis requests from them, too pic.twitter.com/wlHmUaBh6Y — Kat Hoffman (@rkatlady) April 10, 2020 Jokes aside, I was really, really busy during the outbreak.

Patient Treatment Timelines for Longitudinal Survival Data

I am a biostatistician at a research university, and I often find myself working with longitudinal survival data. As with any data analysis, I need to examine the quality of my data before deciding which statistical methods to implement. This post contains reproducible examples for how I prefer to visually explore survival data containing longitudinal exposures or covariates. I create a “treatment timeline” for each patient, and the end product looks something like this:

A short and sweet tutorial on using `sl3` for superlearning

Background In September 2019, I gave an R-Ladies NYC presentation about using the package sl3 to implement the superlearner algorithm for predictions. You can download the slides for it here. This post is a modification to the original demo I gave. For a better background on what the superlearner algorithm is, please see my more recent blog post. Step 0: Load your libraries, set a seed, and load the data You’ll likely need to install sl3 from the tlverse github page, as it was not yet on CRAN at the time of writing this post.

Data Wrangling with dplyr

A Presentation for Weill Cornell Medicine’s Biostatistics Computing Club Image courtesy of Allison Horst’s Twitter: @allison_horst Introduction Why dplyr? Powerful but efficient Consistent syntax Fast Function chaining Works well with entire tidyverse suite Efficiency* Simple syntax Function chaining Ability to analyze external databases Works well with other packages in tidyverse suite ggplot2 tidyr stringr forcats purrr *if you start dealing with data sets with > 1 million rows, data.