July 1, 2015
Which best describes your work/role
Are these scenarios familiar?
No matter what clients/funders/bosses say, what happens is often very different
All these situations need to be well organised and well documented
Standardised systems help too
Additionally, good computing tools (R and non R) can help this process too
Reference:
See (Long 2009) The Workflow of Data Analysis Using Stata. StataCorp LP.
Need to consider
R, make and git can help in many of these
Standard data analysis project directory set up
admin/ backups/ configFile.rds data/ doc/ extra/ lib/ Makefile posted/ readMergeData/ reports/ src/ test/ work/
Data directories initially (raw data and codebook files)
data ├── codebook │  ├── data1_codebook.csv │  └── small2_codebook.csv ├── derived └── original ├── data1-birth.csv ├── data1-yr21.csv └── small2.csv 3 directories, 5 files
target: dependencies
<TAB> command to run
.PHONY: all all: read.Rout read.Rout: read.R bmi2009.dta <TAB> R CMD BATCH read.R
.PHONY: all all: report.pdf report.pdf: report.Rmd analysis.Rout <TAB> Rscript -e "library(rmarkdown);render('report.Rmd')" analysis.Rout: analysis.R read.Rout <TAB> R CMD BATCH --vanilla analysis.R read.Rout: read.R bmi2009.dta <TAB> R CMD BATCH --vanilla read.R
Each target depends on
No Problem
.PHONY: all all: analysis.Rout report.pdf: ${@:.pdf=.Rmd} analysis.Rout analysis.Rout: ${@:.Rout=.R} read.Rout read.Rout: ${@:.Rout=.R} bmi2009.dta include ~/lib/common.mk
Version control helps all projects
Easy to learn?
Good info online or see (Loeliger and McCullough 2012)
Don't Repeat Yourself workflow
Early stage of development but currently can customise:
.
.
.
See blog at http://www.petebaker.id.au
Loeliger, Jon, and Matthew McCullough. 2012. Version Control with Git: Powerful Tools and Techniques for Collaborative Software Development. 2nd ed. O’Reilly Media, Inc.
Long, J. Scott. 2009. The Workflow of Data Analysis Using Stata. StataCorp LP.
Mecklenburg, Robert. 2004. Managing Projects with GNU Make. 3rd ed. O’Reilly Media, Inc.