TL;DR If you’re ever felt limited by correlogram packages in R, this post will show you how to write your own function to tidy the many correlations into a ggplot2-friendly form for plotting.
By the end, you will be able to run one function to get a tidied data frame of correlations:
formatted_cors(mtcars) %>% head() %>% kable() measure1 measure2 r n P sig_p p_if_sig r_if_sig mpg mpg 1.
TL;DR You can a regress an outcome on a grouping variable plus any other variable(s) and the unadjusted and adjusted group means will be identical.
We can see this in a simple example using the iris data:
iris %>% # fit a linear regression for sepal length given sepal width and species # make a new column containing the fitted values for sepal length mutate(preds = predict(lm(Sepal.
I am a biostatistician at a research university, and I often find myself working with longitudinal survival data. As with any data analysis, I need to examine the quality of my data before deciding which statistical methods to implement.
This post contains reproducible examples for how I prefer to visually explore survival data containing longitudinal exposures or covariates. I create a “treatment timeline” for each patient, and the end product looks something like this:
A Presentation for Weill Cornell Medicine’s Biostatistics Computing Club Image courtesy of Allison Horst’s Twitter: @allison_horst
Introduction Why dplyr? Powerful but efficient
Works well with entire tidyverse suite Efficiency*
Ability to analyze external databases
Works well with other packages in tidyverse suite ggplot2 tidyr stringr forcats purrr *if you start dealing with data sets with > 1 million rows, data.