It’s been over a year and a half since I began this blog with my, “A Day in the Life of a Biostatistician” post. To my surprise, it is hands-down the post I receive the most emails about. I really enjoy hearing from everyone who reaches out, but I thought I’d give detailed answers to the most common questions I’ve received in this follow up post.
Caveat all my answers with “this is just one early-career biostatistician’s opinion,” please!

The is the third and final post in a three-part series to help beginners and/or visual learners understand Targeted Maximum Likelihood Estimation (TMLE). In this section, I discuss more statistical properties of TMLE, offer a brief explanation for the theory behind TMLE, and provide resources for learning more.
Properties of TMLE 📈 To reiterate a point from Parts I and II, a main motivation for TMLE is that it allows the use of machine learning algorithms while still yielding asymptotic properties for inference.

The second post of a three-part series to help beginners and/or visual learners understand Targeted Maximum Likelihood Estimation (TMLE). This section walks through the TMLE algorithm for the mean difference in outcomes for a binary treatment and binary outcome. This post is an expansion of a printable “visual guide” available on my Github. I hope it helps analysts who feel out-of-practice reading mathematical notation follow along with the TMLE algorithm.

The introductory post of a three-part series to help beginners and/or visual learners understand Targeted Maximum Likelihood Estimation (TMLE). This section contains a brief overview of the targeted learning framework and motivation for semiparametric estimation methods for inference, including causal inference.
Table of Contents This blog post series has three parts:
Part I: Motivation TMLE in three sentences 🎯 An Analyst’s Motivation for Learning TMLE 👩🏼💻 Is TMLE Causal Inference?

Why use one machine learning algorithm when you could use all of them?! This post contains a step-by-step walkthrough of how to build a superlearner prediction algorithm in R.
HTML Image as link A Visual Guide… Over the winter, I read Targeted Learning by Mark van der Laan and Sherri Rose. This “visual guide” I made for Chapter 3: Superlearning by Rose, van der Laan, and Eric Polley is a condensed version of the following tutorial.

A personal narrative about my experience on the data analytics side of NYC’s COVID-19 outbreak response.
December 15, 2018. My coworker is moving to California. She’s a statistician for a group of pulmonary and critical care physicians at our New York City hospital, and I’m a statistician who’s trying not to do too many things wrong, only three months into my first job out of school. “I think you’d be good with this research team,” she tells me.

TL;DR If you’re ever felt limited by correlogram packages in R, this post will show you how to write your own function to tidy the many correlations into a ggplot2-friendly form for plotting.
By the end, you will be able to run one function to get a tidied data frame of correlations:
formatted_cors(mtcars) %>% head() %>% kable() measure1 measure2 r n p sig_p p_if_sig r_if_sig mpg mpg 1.

An “aha!” moment: the day I realized I should rethink all the probability theorems using linear regressions.
TL;DR You can a regress an outcome on a grouping variable plus any other variable(s) and the unadjusted and adjusted group means will be identical.
We can see this in a simple example using the palmerpenguins data:
#remotes::install_github("allisonhorst/palmerpenguins") library(palmerpenguins) library(tidyverse) library(gt) # use complete cases for simplicity penguins <- drop_na(penguins) penguins %>% # fit a linear regression for bill length given bill depth and species # make a new column containing the fitted values for bill length mutate(preds = predict(lm(bill_length_mm ~ bill_depth_mm + species, data = .

In non-coronavirus times, I am the biostatistician for a team of NYC pulmonologists and intensivists. When the pandemic hit NYC in mid-March, I immediately became a 100% 200% COVID-19 statistician. I received many analysis requests, though not all of them from official investigators:
My family recently learned I am the statistician for my hospital’s pulmonologists and now I get COVID-19 analysis requests from them, too pic.twitter.com/wlHmUaBh6Y — Kat Hoffman (@rkatlady) April 10, 2020 Jokes aside, I was really, really busy during the outbreak.

I am a biostatistician at a research university, and I often find myself working with longitudinal survival data. As with any data analysis, I need to examine the quality of my data before deciding which statistical methods to implement.
This post contains reproducible examples for how I prefer to visually explore survival data containing longitudinal exposures or covariates. I create a “treatment timeline” for each patient, and the end product looks something like this:

Powered by the Academic theme for Hugo.