Step-by-step of my workflow
- Open a new repository in Github
- Create a new RStudio Project by choosing the option to “Checkout a project from a version control repository”
- I find doing 1 then 2 easier than the other way around
- Create folders/directories:
rawdata
data
- intermediate data outputsreports
- usually R Notebooks or R Markdown files recording summary statistics, analyses- Share html reports by pushing them to the Github repository and viewing with https://htmlpreview.github.io/
scripts
- If you’re using multiple languages you can have subdirectories for different languages e.g
scripts/R
,scripts/python
functions.R
script to keep functions separate from other programmingsource.R
script that runs all other scripts and (at least in theory!) can generate final output from raw data
- If you’re using multiple languages you can have subdirectories for different languages e.g
presentations
- slides for presentations; most recently I’ve been using revealjsmanuscripts
- papers, extended abstractsplots
- I often save plots/graphs as.rds
files instead of image files (like JPEG, PNG etc.) so that I can do the editing in R Markdown withggplot2
- Others:
models
- save model output after, say, running a regressionresults
- typically regression coefficients stored as a dataframe
- Github Issues (click on the Issues tab in your Github repository): I use this as my to-do list. It’s a nice way of keeping your tasks organized by project
- Communicating with collaborators: I use Slack but there are other options; good ol’ email is fine but it can get hard to dig up old conversations
There are plenty of great resources out there that go into detail on the hows and whys of setting up a reproducible project. A couple of places to get started:
Leave a Comment