The is the third and final post in a three-part series to help beginners and/or visual learners understand Targeted Maximum Likelihood Estimation (TMLE). In this section, I discuss more

statistical properties of TMLE, offer a briefexplanation for the theory behind TMLE, and provideresources for learning more.

# Properties of TMLE đ

To reiterate a point from *Parts I* and *II*, a main motivation for TMLE is that it **allows the use of machine learning algorithms while still yielding asymptotic properties for inference**. This is notably *not* true for many estimators.

For example, in *Part II* we walked through TMLE for the Average Treatment Effect (ATE). Two frequently used alternatives to estimating the ATE are G-computation and Inverse Probability of Treatment Weighting (see *Part II, Step 1* and *references*). In general, neither yield valid standard errors unless *a-priori* specified parametric models are used, and this reliance on parametric assumptions can bias results. There are many simulation studies that show this.

Another beneficial property of TMLE is that it is a **doubly robust** estimator. This means that if either the regression to estimate the expected outcome, or the regression to estimate the probability of treatment, are correctly specified (formally, their bias goes to zero as sample size grows large, meaning they are **consistent**), the final TMLE estimate will be consistent.

If both regressions are consistent, the **final estimate will reach the smallest possible variance at a rate of \(\sqrt{n}\)**, which is the fastest possible rate of convergence and equivalent to parametric maximum likelihood estimation. The reason we use superlearning for estimating the outcome and treatment regressions is to give us the best possible chance of having two correctly specified models and obtaining an **efficient estimate**.

Even among other doubly robust estimators, TMLE is appealing because its estimates will always stay within the bounds of the original outcome. This is because it is part of a class of **substitution estimators**. There is another class of doubly robust, semiparametric estimation methods frequently used in causal inference that are referred to as **one-step estimators**, but they can sometimes yield final estimates that are outside the original outcome scale. The one-step estimator for the ATE is called **Augmented Inverse Probability Weighting (AIPW)**.

# Why does TMLE work? â¨

Truly understanding why TMLE works requires semiparametric theory that falls far outside the scope of this tutorial. However, the theory is interesting, so Iâll give a brief, high-level explanation, and then you can look at the references if youâre curious to learn more. Importantly, the explanation I outline here is more than sufficient and certainly not necessary to appropriately implement TMLE as an analyst.

**TMLE relies on the following ideas:**

Some estimands allow for

**asymptotically linear estimation**. This means that estimators can be represented as sample averages (plus a term that converges to zero).The quantities being averaged for asymptotically linear estimators are called

**influence functions**. An influence function is a function that quantifies how much influence each observation has on the estimator. For this reason, it is very**useful to characterize the variance of the estimator**. In parametric maximum likelihood estimation, the influence function is related the score function.The

**efficient influence function**(EIF) is the influence function that achieves the efficiency bound (think Cramer Rao Lower Bound from parametric maximum likelihood estimation) and**can be used to create efficient estimators.**If we want to

**construct an estimator that is efficient**, we can take advantage of the EIF to endow the estimator with useful asymptotic properties.

This is the reason TMLE allows us to use machine learning models âunder the hoodâ while still obtaining asymptotic properties for inference: our **estimand** of interest admits **asymptotically linear estimation**, and we are **using properties of the EIF** to **construct an estimator** with **optimal statistical properties** (e.g.Â double robustness).

# Resources to learn more

I could only cover so much in this post, but here are the resources Iâve used the most to learn about TMLE, semiparametric estimation, and causal inference. If you are new to any or all of it, there is a good chance it will take *several* reads of these materials before the concepts begin to make any sense. Donât get discouraged!

### TMLE

The paper I referred to most often while learning TMLE was

*Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies*by Megan S. Schuler and Sherri Rose. It has a nice step-by-step written explanation and details the statistical advantages of TMLE for an applied thinker.I also really like the written explanations in the

*Targeted Learning*book (Chapters 4 and 5) by Mark van der Laan and Sherri Rose. The notation was often too difficult for me to follow, but the words themselves make a lot of sense.Miguel Angel Luque-Fernandez wrote an excellent bookdown tutorial on TMLE, also with step-by-step

`R`

code. It is more technical and thorough than my post, but still aimed at an applied audience. He also has a tutorial on the functional delta method which is part of the theory behind the way we compute the standard errors (see*Part II, Step 6*).Other code-based web-based tutorials on TMLE include David Benkeser and Antoine Chambazâ A Ride in Targeted Learning Territory and the authors of the

`R`

package suite`tlverse`

âs Hitchhikerâs Guide to Targeted Learning: The TMLE Framework.

### Semiparametric Theory and Influence Functions

Edward Kennedy has several well-written pieces on semiparametric estimation in causal inference. I recommend starting with:

His introductory paper on Semiparametric Theory

His slideshow tutorial

*Nonparametric efficiency theory and machine learning in causal inference*

My favorite resource so far for learning specifically about influence functions has been Visually Communicating Influence Functions by Aaron Fisher and Edward Kennedy. However, this paper didnât make sense to me until I worked through this interactive tutorial by Herb Susmann. I suggest playing around with the interactive examples first, and then trying to work through the paper.

The derivation of the Efficient Influence Function (EIF) in TMLE is in the Appendix of

*Targeted Learning*.

### Causal Inference

As emphasized in

*Part I*, TMLE is an estimation technique which*can*be used for causal inference. If you want to learn about the foundations of causal inference, I suggest two different introductory texts below. Note that these provide fairly different frameworks (notation, descriptions of assumptions) to reach the same conclusions, but both provide useful perspectives.*Causal Inference in Statistics: A Primer*by Judea Pearl. Pearl does not discuss estimation methods, but rather focuses on the assumptions, or identification, side of causal inference. Thus, you will not find TMLE mentioned in this text.*What If*by Miguel Hernan and James Robins. Notably, Hernan and Robins only discuss parametric estimation methods, so you will also not find TMLE or AIPW in this text.

I also think the introductory chapters of the previously mentioned

*Targeted Learning*book (Chapters 1 and 2) do an excellent job of setting up the âroadmapâ of causal inference.

Iâll continue to update this page with beginnerâs resources as I discover them.

Feedback or clarifications on this post is welcome, either from the new learners of TMLE or experts in causal inference. The best way to reach me is through email.

# Acknowledgements

This tutorial would not have been possible without my colleague IvĂĄn DĂaz patiently answering many, many questions on TMLE. I am also very appreciative of Miguel Angel Luque-Fernandezâs helpful feedback on the visual guide.

Lastly, many thanks to Axel Martin, Nick Williams, Anjile An, Adam Peterson, Alan Wu, and Will Simmons for providing suggestions on various drafts of this art project!