# R

## An Illustrated Guide to TMLE, Part III: Properties, Theory, and Learning More

The is the third and final post in a three-part series to help beginners and/or visual learners understand Targeted Maximum Likelihood Estimation (TMLE). In this section, I discuss more statistical properties of TMLE, offer a brief explanation for the theory behind TMLE, and provide resources for learning more. Properties of TMLE š To reiterate a point from Parts I and II, a main motivation for TMLE is that it allows the use of machine learning algorithms while still yielding asymptotic properties for inference.

## An Illustrated Guide to TMLE, Part II: The Algorithm

The second post of a three-part series to help beginners and/or visual learners understand Targeted Maximum Likelihood Estimation (TMLE). This section walks through the TMLE algorithm for the mean difference in outcomes for a binary treatment and binary outcome. This post is an expansion of a printable āvisual guideā available on my Github. I hope it helps analysts who feel out-of-practice reading mathematical notation follow along with the TMLE algorithm.

## An Illustrated Guide to TMLE, Part I: Introduction and Motivation

The introductory post of a three-part series to help beginners and/or visual learners understand Targeted Maximum Likelihood Estimation (TMLE). This section contains a brief overview of the targeted learning framework and motivation for semiparametric estimation methods for inference, including causal inference. Table of Contents This blog post series has three parts: Part I: Motivation TMLE in three sentences šÆ An Analystās Motivation for Learning TMLE š©š¼āš» Is TMLE Causal Inference?

## Become a Superlearner! An Illustrated Guide to Superlearning

Why use one machine learning algorithm when you could use all of them?! This post contains a step-by-step walkthrough of how to build a superlearner prediction algorithm in R. HTML Image as link A Visual Guideā¦ Over the winter, I read Targeted Learning by Mark van der Laan and Sherri Rose. This āvisual guideā I made for Chapter 3: Superlearning by Rose, van der Laan, and Eric Polley is a condensed version of the following tutorial.

## Customizable correlation plots in R

TL;DR If youāre ever felt limited by correlogram packages in R, this post will show you how to write your own function to tidy the many correlations into a ggplot2-friendly form for plotting. By the end, you will be able to run one function to get a tidied data frame of correlations: formatted_cors(mtcars) %>% head() %>% kable() measure1 measure2 r n p sig_p p_if_sig r_if_sig mpg mpg 1.

## Rethinking Conditional and Iterated Expectations with Linear Regression Models

An āaha!ā moment: the day I realized I should rethink all the probability theorems using linear regressions. TL;DR You can a regress an outcome on a grouping variable plus any other variable(s) and the unadjusted and adjusted group means will be identical. We can see this in a simple example using the palmerpenguins data: #remotes::install_github("allisonhorst/palmerpenguins") library(palmerpenguins) library(tidyverse) library(gt) # use complete cases for simplicity penguins <- drop_na(penguins) penguins %>% # fit a linear regression for bill length given bill depth and species # make a new column containing the fitted values for bill length mutate(preds = predict(lm(bill_length_mm ~ bill_depth_mm + species, data = .

## Lessons learned: my top five coding 'tricks' during the NYC COVID-19 outbreak

In non-coronavirus times, I am the biostatistician for a team of NYC pulmonologists and intensivists. When the pandemic hit NYC in mid-March, I immediately became a 100% 200% COVID-19 statistician. I received many analysis requests, though not all of them from official investigators: My family recently learned I am the statistician for my hospitalās pulmonologists and now I get COVID-19 analysis requests from them, too pic.twitter.com/wlHmUaBh6Y ā Kat Hoffman (@rkatlady) April 10, 2020 Jokes aside, I was really, really busy during the outbreak.

## Patient Treatment Timelines for Longitudinal Survival Data

I am a biostatistician at a research university, and I often find myself working with longitudinal survival data. As with any data analysis, I need to examine the quality of my data before deciding which statistical methods to implement. This post contains reproducible examples for how I prefer to visually explore survival data containing longitudinal exposures or covariates. I create a ātreatment timelineā for each patient, and the end product looks something like this:

## A short and sweet tutorial on using `sl3` for superlearning

Background In September 2019, I gave an R-Ladies NYC presentation about using the package sl3 to implement the superlearner algorithm for predictions. You can download the slides for it here. This post is a modification to the original demo I gave. For a better background on what the superlearner algorithm is, please see my more recent blog post. Step 0: Load your libraries, set a seed, and load the data Youāll likely need to install sl3 from the tlverse github page, as it was not yet on CRAN at the time of writing this post.

## Data Wrangling with dplyr

A Presentation for Weill Cornell Medicineās Biostatistics Computing Club Image courtesy of Allison Horstās Twitter: @allison_horst Introduction Why dplyr? Powerful but efficient Consistent syntax Fast Function chaining Works well with entire tidyverse suite Efficiency* Simple syntax Function chaining Ability to analyze external databases Works well with other packages in tidyverse suite ggplot2 tidyr stringr forcats purrr *if you start dealing with data sets with > 1 million rows, data.