TL;DR If you’re ever felt limited by correlogram packages in R, this post will show you how to write your own function to tidy the many correlations into a ggplot2-friendly form for plotting.
By the end, you will be able to run one function to get a tidied data frame of correlations:
formatted_cors(mtcars) %>% head() %>% kable() measure1 measure2 r n P sig_p p_if_sig r_if_sig mpg mpg 1.
TL;DR You can a regress an outcome on a grouping variable plus any other variable(s) and the unadjusted and adjusted group means will be identical.
We can see this in a simple example using the iris data:
iris %>% # fit a linear regression for sepal length given sepal width and species # make a new column containing the fitted values for sepal length mutate(preds = predict(lm(Sepal.
I am a biostatistician at a research university, and I often find myself working with longitudinal survival data. As with any data analysis, I need to examine the quality of my data before deciding which statistical methods to implement.
This post contains reproducible examples for how I prefer to visually explore survival data containing longitudinal exposures or covariates. I create a “treatment timeline” for each patient, and the end product looks something like this:
Using sl3 to build ensemble learning models in R
An introduction to coding power simulations in R
R Projects and here::here()
It seems fitting that my first blog post is on a topic that I tried and failed to find via Google search a few years ago.
I’ll back up for a second. A few years ago I was a recent college graduate, and trying hard to “figure out my life.” My major was biochemistry, which is one of those degrees where 99%* of people just keep on going to school.
When doing long, identical analyses on different data sets or variables, it can be useful to have one function which outputs your analyses in an Rmarkdown friendly (ie., with headers) format. This is a simple example of how multiple mini-analyses can be combined into one run-all function containing headers. Let’s say we have two separate data sets, dat1 and dat2, and we want to look do two analyses on each data set.
A Presentation for Weill Cornell Medicine’s Biostatistics Computing Club Image courtesy of Allison Horst’s Twitter: @allison_horst
Introduction Why dplyr? Powerful but efficient
Works well with entire tidyverse suite Efficiency*
Ability to analyze external databases
Works well with other packages in tidyverse suite ggplot2 tidyr stringr forcats purrr *if you start dealing with data sets with > 1 million rows, data.