EPID 701 – R for Epidemiologists – Spring 2020

NOTE: Because of anticipated COVID-19-related funding impacts, this course is currently not being offered at UNC in the 2020-2021 school year. It may be offered in the future. I will leave these recordings / resources up for current students and practicing epidemiologists looking for an introduction to R for public health practice, though please note these slides are a bit out of date at this point. Students and epidemiologists are welcome to reach out to me with questions. -Mike, Dec 2020

Welcome!

EPID 701 : R for Epidemiologists is a 3-credit class intended to be the most effective and efficient way for UNC Epidemiology students to establish a foundation in the R programming language, RStudio IDE, and functional programming modalities. We give special attention to R topics and packages relevant for epidemiological data management, analysis, and visualization – including basic maps and spatial analysis. Homework assignments are designed to ease epidemiology students into the language efficiently by building on the UNC Epidemiology core curriculum, using familiar datasets and questions seen in other classes. In addition to homework, each student completes a project of their choosing to incorporate R programming into their existing research or coursework. There are no prerequisites, but students who have taken EPID 716 will benefit from their past assignments, and students who have taken core epidemiology courses (EPID 710, 715, etc.) will benefit from having been introduced to fundamental epidemiology concepts used in this course. Initially a workshop designed by students for students, EPID 701 (previously 799C) is now in its fourth year as a for-credit course.

The Spring 2020 class will be held T/Th 9:30-10:45 am (Roseneau 235).

If you are looking for past year’s courses, see here: 2017 Fall and 2018 Fall. R is a living language – we update course materials every year to reflect the ongoing improvements to key packages and language, so course materials for previous years should be treated as snapshots in time.

How to get ready for the class:

  1. Fill out the 2020 Class Roster and join the google group to get class messages.
  2. Download the Spring 2020 Syllabus and NC births data* for homework.
  3. Download and install (or update) both R and RStudio (instructions here if you need them).

If you joined the class late: Welcome! We suggest you review (and work through in R!) the previous slides, download and load the data, and review the past messages in the google group to get you up to speed.

Non-student / public health practitioners: If you are a public health practitioner (e.g. local or state epidemiologist, data scientist, public health analyst, etc.) interested in this course material but unable to register for the in-person class, please contact Mike for ways to be involved, join the course listserv, and other professional development and collaboration options for R in public health practice.

On DataCamp: This course previously used DataCamp for supplemental course examples, but we have dropped that resource following the lead of R-Ladies Global. Other R instructors at UNC-CH and elsewhere may consider doing the same.

Interested in the future of this course? Though there seems to be plenty of student interest in this course, the future of the course in the EPID department is frankly a bit uncertain. Those interested in taking this course in the future at UNC Chapel Hill should let Mike know about their interest – it helps us make the case for the course with leadership. In the future these course materials may move to a more R-friendly format like bookdown with other examples here.

Questions? Contact email Mike at mike.dolan.fliss@unc.edu.


Course Schedule

Week L# Lecture Homework Recommended
Th 1/9 1 Introduction to the Course and RStudio
(recording, notes) (Mike)
In Class: Fill out Roster. Join google group. Install R & RStudio (instructions). (1) Writing Code in RStudio – First of a great series of free recorded webinars. (2) Article: The future of R. (3) Keyboard shortcuts overall, and in R. (4) Most recent RStudio release notes. (5) R or Python for Data Science?
Tu 1/14 2 Coding with R I: RStudio, structures, & subsetting
(recording, notes) (Mike)
Start In Class: Follow “TOUR” activity slides from class. Make sure you can load / read data and load(tidyverse). Nothing to hand in.

Previous class setup if you’re new!

(1) Hadley’s Adv-R: Intro. (2) Jenny’s Project-oriented workflow instead of setwd(), etc. – covered lightly later in class. (3) Jenny’s File naming conventions. (4) Tidyverse Style Guide. (5) Code for the maintainer.
Th 1/16 Coding with R I: structures & subsetting, cont.
(same slides as Tue!)
(script, recording, notes) (Mike)
HW1 introduced
(see slides for background, due date 1/30 below for assignment)
Connecting to the community! Blogs, Awesome R, webinars, RStudio mailing list and blogs, etc. Use something like BlogTrottr to subscribe to blogs if there’s no built in tech.
Tue 1/21 4 Coding with R II: strings, factors & dates
(recording, notes, script) (Mike)
R for Data Science Ch 14, 15 & 16: Strings, Factors and Dates. Articles: Scientists rename human genes….
Thu 1/23 5 Epi Review & Coding I: control & functions
(recording, notes, script) (Mike)
HW Project Deeper Dive. HW1 Due in 1 week. Adv-R: Ch 1-3. Packages: desctable, skimr, tableone, etc.
Tu 1/28 6 Recoding I: Numerical & Graphical Descriptives
(recording, notes)
(Hillary)
Adv-R: Foundations (stop at functions)
Thu 1/30 7 Recoding II & Inclusion
(recording, notes)
(Mike)
Read Jenny Bryan’s (excellent) purrr tutorial; suggest lessons 1-4.
Advanced Recommendations: Adv-R – Functional Programming 1-3. On functions: “worse is better.”
Tue 2/4 8 Dplyr (recording, notes) (Hillary) Due: HW1
(Due date bumped from 1/30)
(answers)
Highly recommend: RStudio data wrangling and tidyverse webinars. Regular Expressions in stringr in R. Tidyverse blog. Tidy Tuesday weekly exercises and highlights.
Thu 2/6 9 Dplyr 2 (recording, notes) (Mike) More on tidyr
(note this uses spread/gather instead of pivot_wider / pivot_longer )
Tue 2/11 10 Graphics: ggplot 1 (recording, no notes today) (Mike) Due: HW2
(answers)
Suggest: subscribe to Visualizing Data Blog
Thu 2/13 11 Graphics: ggplot 2
(recording, notes)
(Mike)
Neat Equisse package for ggplot gui development. Data viz checklist. Check out ggplot extensions on your own. More workshops on ggplot. Multi-plotting packages: egg, patchwork, cowplot, ggpubr. Highlighted extensions and improvements. Advanced text formatting. Visual libraries. An excellent primer on design.
Tue 2/18 12 GLM I
(recording, notes) (Hillary)
Note: Recording failed for this lecture. See the 2018 and 2017 courses for an alternate recording.
Thu 2/20 13 GLMs II
(recording, notes) (Mike)
Tue 2/25 14 — Class Canceled —
Use time to work on HW3 or project
Due: HW3
(answers)
On Structural Confounding: Effects of segregation on preterm birth. On structural competency. 5 Steps for anti-racist data science.
Thu 2/27 15 Confounding, DAGs, & Effect Measure Modification
(recording, notes) (Mike)
On race-ethnicity: (1) What Makes Someone Native American? Spirit of 1848 listserv. Infant mortality disparities. Measures of racism (Krieger, 2020). Association of race/ethnicity with preterm neonatal morbidities & commentary. ICE and neighborhoods.
Tue 3/3 16 Outputs and Reports
(excel, recording, notes) (Mike)
 Promising future development: gt grammar of tables package
Thu 3/5 17 R Markdown
(notes, recording) (Hillary)
Due: HW4
(answers)
Tue 3/10 SPRING BREAK:
NO CLASS
Thu 3/12 SPRING BREAK:
NO CLASS
Tue 3/17 COVID-19:
NO CLASS
Thu 3/19 COVID-19:
NO CLASS
Tue 3/24 18 Maps 1
(notes, simplified maps, recording)
COVID-19: Classes will be entirely online and shorter (30-40 minutes), more lightning style, with recommended reading to supplement. We’ll start live this week and discuss the feasibility of live (and recorded) Zoom vs. pre-recorded Zoom given students needs and capacity. 
Thu 3/26 19 Maps 2
(notes, recording)
(MikeHillary)
rayshader for 3D maps, extra practice w/ ggplot. RQGIS for tighter QGIS integration. Vox on John Snow.
Tue 3/31 20 Special Topics: Spatial Auto-correlation  (Paul Delamater) (notes, code, recording part 1, part 2, & part 3) COVID-19: HW5 due date bumped to 3/31
Due
: HW5
(answers)
Articles for today: Getis, Bivand
Thu 4/2 21 Interactive Design: Shiny & Tableau
(notes, recording)
(1) Dos and don’ts on designing for accessibility. (2) Tufte / Few. (3) Shiny SIR example, NC opioid dashboard.
Tue 4/7 22 Special Topics: Algorithms, Speed, Patterns, Purrr
(notes, recording)
COVID-19: HW6 due date bumped to 4/7
Due
: HW6
(answers, received HW matrix)
dtplyr, dbplyr.
Webinar: Hadley on list-columns or Jenny’s fantastic tutorial and repurrrsive package or list-columns to reduce cognitive burden of coding. (These webinars are high level and take multiple listenings, but *highly* recommended.).
Thu 4/9 23 Special Topics: Other Models: Survival & Multilevel (notes, recording)
Tue 4/14 24 Special Topics: Machine Learning
(notes, recording)
COVID-19: Project due date bump to 4/14 at midnight. Project slides / doc + 1-2 min intro + questions. 

Due: Project. See 2017 projects or 2018 projects for ideas.

Stop explaining black box models…use interpretable models instead (Rudin et al., 2019). Teaching yourself about structural racism will improve your machine learning (Robinson, Renson, & Naimi, 2019). Racist facial recognition models. On minimum wage and infant mortality disparities (Rosenquist et al., 2020). Tidymodels.
Thu 4/16 25 Presentations
COVID-19: 1-2 m intro, questions
  Presenting: Bhavna, Joelle, Zakiya, Ishrat, Eugene, Molly, Jihye, Yitian, Riju, Thomas, Kate
Tue 4/21 26 Presentations
COVID-19: 1-2 m intro, questions
Presenting: Adams (from Tue), Alice, Maddie,  Isabella, Hanna, Katie, Maggie, Julie, Emma, Elyse, Charley & Cary, Tricia
Thu 4/23 27 Wrap-up session
(notes)
Quite advanced: production-level R and meta-programming. renv for reproducibility.
* License and data information: Please note that the NC Birth Data has recently moved to publicly available *upon request* from being publicly available and available online. We have hosted the dataset here with permission from the North Carolina State Center for Health Statistics (NC SCHS) for the class and associated workshop purposes. Uses outside of those should contact SCHS and request the data using the F-14 data request form. The slides, recordings, and other class materials are offered under the GNU Public Licenses (GPL), though we request group uses “at a distance” (e.g. not in class or workshops) contact the instructor staff for permission. Individuals using class material for self-training are encouraged to reach out to the instructor staff to say hello and provide feedback!