UCSC-SOE-14-12: Redo: Reproducibility at Scale

Ivo Jimenez, Carlos Maltzahn, Adam Moody, Kathryn Mohror
10/14/2014 04:28 PM
Computer Science
A key component of the scientific method is the ability to revisit and reproduce previous results. In a large-scale scenario, reproducibility is extremely challenging due to the massive nature of the environment. In this work, we introduce a framework to address the issue of reproducibility in computational and data science. Our framework characterises the reproducibility space, whose domain is comprised by the set of possible changes in time of one or more dependencies of an experiment, producing distinct effects such as incorrect results or performance degradation. Redo is capable of tracking and explaining changes in small (desktop) and large (Cloud and HPC) computational environments, allowing a scientist to quickly determine the source of irreproducibility of an experiment. By making reproducibility a first class component of the framework, Redo empowers computational and data scientists to generate reproducible research.