An Illustrated Guide to TMLE, Part III: Properties, Theory, and Learning More

The is the third and final post in a three-part series to help beginners and/or visual learners understand Targeted Maximum Likelihood Estimation (TMLE). In this section, I discuss more statistical properties of TMLE, offer a brief explanation for the theory behind TMLE, and provide resources for learning more.

Properties of TMLE 📈

To reiterate a point from Parts I and II, a main motivation for TMLE is that it allows the use of machine learning algorithms while still yielding asymptotic properties for inference. This is notably not true for many estimators.

For example, in Part II we walked through TMLE for the Average Treatment Effect (ATE). Two frequently used alternatives to estimating the ATE are G-computation and Inverse Probability of Treatment Weighting (see Part II, Step 1 and references). In general, neither yield valid standard errors unless a-priori specified parametric models are used, and this reliance on parametric assumptions can bias results. There are many simulation studies that show this.

Another beneficial property of TMLE is that it is a doubly robust estimator. This means that if either the regression to estimate the expected outcome, or the regression to estimate the probability of treatment, are correctly specified (formally, their bias goes to zero as sample size grows large, meaning they are consistent), the final TMLE estimate will be consistent.

If both regressions are consistent, the final estimate will reach the smallest possible variance at a rate of \(\sqrt{n}\), which is the fastest possible rate of convergence and equivalent to parametric maximum likelihood estimation. The reason we use superlearning for estimating the outcome and treatment regressions is to give us the best possible chance of having two correctly specified models and obtaining an efficient estimate.

Even among other doubly robust estimators, TMLE is appealing because its estimates will always stay within the bounds of the original outcome. This is because it is part of a class of substitution estimators. There is another class of doubly robust, semiparametric estimation methods frequently used in causal inference that are referred to as one-step estimators, but they can sometimes yield final estimates that are outside the original outcome scale. The one-step estimator for the ATE is called Augmented Inverse Probability Weighting (AIPW).

Why does TMLE work? ✨

Truly understanding why TMLE works requires semiparametric theory that falls far outside the scope of this tutorial. However, the theory is interesting, so I’ll give a brief, high-level explanation, and then you can look at the references if you’re curious to learn more. Importantly, the explanation I outline here is more than sufficient and certainly not necessary to appropriately implement TMLE as an analyst.

TMLE relies on the following ideas:

  1. Some estimands allow for asymptotically linear estimation. This means that estimators can be represented as sample averages (plus a term that converges to zero).

  2. The quantities being averaged for asymptotically linear estimators are called influence functions. An influence function is a function that quantifies how much influence each observation has on the estimator. For this reason, it is very useful to characterize the variance of the estimator. In parametric maximum likelihood estimation, the influence function is related the score function.

  3. The efficient influence function (EIF) is the influence function that achieves the efficiency bound (think Cramer Rao Lower Bound from parametric maximum likelihood estimation) and can be used to create efficient estimators.

  4. If we want to construct an estimator that is efficient, we can take advantage of the EIF to endow the estimator with useful asymptotic properties.

This is the reason TMLE allows us to use machine learning models “under the hood” while still obtaining asymptotic properties for inference: our estimand of interest admits asymptotically linear estimation, and we are using properties of the EIF to construct an estimator with optimal statistical properties (e.g. double robustness).

Resources to learn more

I could only cover so much in this post, but here are the resources I’ve used the most to learn about TMLE, semiparametric estimation, and causal inference. If you are new to any or all of it, there is a good chance it will take several reads of these materials before the concepts begin to make any sense. Don’t get discouraged!


Semiparametric Theory and Influence Functions

  • Edward Kennedy has several well-written pieces on semiparametric estimation in causal inference. I recommend starting with:

  • My favorite resource so far for learning specifically about influence functions has been Visually Communicating Influence Functions by Aaron Fisher and Edward Kennedy. However, this paper didn’t make sense to me until I worked through this interactive tutorial by Herb Susmann. I suggest playing around with the interactive examples first, and then trying to work through the paper.

  • The derivation of the Efficient Influence Function (EIF) in TMLE is in the Appendix of Targeted Learning.

Causal Inference

  • As emphasized in Part I, TMLE is an estimation technique which can be used for causal inference. If you want to learn about the foundations of causal inference, I suggest two different introductory texts below. Note that these provide fairly different frameworks (notation, descriptions of assumptions) to reach the same conclusions, but both provide useful perspectives.

    • Causal Inference in Statistics: A Primer by Judea Pearl. Pearl does not discuss estimation methods, but rather focuses on the assumptions, or identification, side of causal inference. Thus, you will not find TMLE mentioned in this text.

    • What If by Miguel Hernan and James Robins. Notably, Hernan and Robins only discuss parametric estimation methods, so you will also not find TMLE or AIPW in this text.

  • I also think the introductory chapters of the previously mentioned Targeted Learning book (Chapters 1 and 2) do an excellent job of setting up the “roadmap” of causal inference.

I’ll continue to update this page with beginner’s resources as I discover them.

Feedback or clarifications on this post is welcome, either from the new learners of TMLE or experts in causal inference. The best way to reach me is through email.


This tutorial would not have been possible without my colleague Iván Díaz patiently answering many, many questions on TMLE. I am also very appreciative of Miguel Angel Luque-Fernandez’s helpful feedback on the visual guide.

Lastly, many thanks to Axel Martin, Nick Williams, Anjile An, Adam Peterson, Alan Wu, and Will Simmons for providing suggestions on various drafts of this art project!

Katherine Hoffman
Research Biostatistician

I am passionate about meaningful, reproducible medical research.