TL;DR If you’re ever felt limited by correlogram packages in R, this post will show you how to write your own function to tidy the many correlations into a ggplot2-friendly form for plotting.
By the end, you will be able to run one function to get a tidied data frame of correlations:
formatted_cors(mtcars) %>% head() %>% kable() measure1 measure2 r n P sig_p p_if_sig r_if_sig mpg mpg 1.
TL;DR You can a regress an outcome on a grouping variable plus any other variable(s) and the unadjusted and adjusted group means will be identical.
We can see this in a simple example using the iris data:
iris %>% # fit a linear regression for sepal length given sepal width and species # make a new column containing the fitted values for sepal length mutate(preds = predict(lm(Sepal.
I am a biostatistician at a research university, and I often find myself working with longitudinal survival data. As with any data analysis, I need to examine the quality of my data before deciding which statistical methods to implement.
This post contains reproducible examples for how I prefer to visually explore survival data containing longitudinal exposures or covariates. I create a “treatment timeline” for each patient, and the end product looks something like this:
Earlier this year I wrote my first blog post, “A Day in the Life of a Biostatistician,” documenting the granular details of my work as an early career academic research biostatistician. I’m excited to announce I am turning that post into a “day in the life” series in which I interview other biostatisticians with differing roles. My hope is that it will enlighten anyone interested in the field of biostatistics, and especially help undergraduate and current biostatistics Masters students make informed decisions about their careers.
In early May I attended the New York R Conference. There were 24 speakers, including my coworker at Weill Cornell Medicine, Elizabeth Sweeney! Each person did a 20-minute presentation on some way they use R for their work and/or hobbies. There was a ton of information, and even though not all of it was directly useful for my workflow as a statistical consultant in an academic setting, I really enjoyed being around so many people who love R.
It seems fitting that my first blog post is on a topic that I tried and failed to find via Google search a few years ago.
I’ll back up for a second. A few years ago I was a recent college graduate, and trying hard to “figure out my life.” My major was biochemistry, which is one of those degrees where 99%* of people just keep on going to school.
When doing long, identical analyses on different data sets or variables, it can be useful to have one function which outputs your analyses in an Rmarkdown friendly (ie., with headers) format. This is a simple example of how multiple mini-analyses can be combined into one run-all function containing headers. Let’s say we have two separate data sets, dat1 and dat2, and we want to look do two analyses on each data set.
A Presentation for Weill Cornell Medicine’s Biostatistics Computing Club Image courtesy of Allison Horst’s Twitter: @allison_horst
Introduction Why dplyr? Powerful but efficient
Works well with entire tidyverse suite Efficiency*
Ability to analyze external databases
Works well with other packages in tidyverse suite ggplot2 tidyr stringr forcats purrr *if you start dealing with data sets with > 1 million rows, data.