Next: Benefits to the reader Up: Schwab et al: Reproducible Previous: Schwab et al: Reproducible

Reproducible research

On average, two PhD students graduate each year from our laboratory, the Stanford Exploration Project. Years ago, junior students who built on their seniors' work often spent a considerable effort to merely reproduce their colleagues' old computational results.

Indeed, the problem occurs wherever traditional methods of scientific publication are used to describe computational research. In a traditional article the author merely outlines the relevant computations: the limitations of a paper medium prohibit a complete documentation including experimental data, parameter values, and the author's programs. Consequently, the reader has painfully to re-implement the author's work before verifying and utilizing it. Even if the reader receives the author's source files (a feasible assumption considering the recent progress in electronic publishing), the results can be recomputed only if the various programs are invoked exactly as in the original publication. The reader must spend valuable time merely rediscovering minutiae, which the author was unable to communicate conveniently.

To facilitate efficient technology transfer, our laboratory developed the concept of a reproducible electronic document (ReDoc). A reader of a reproducible document can remove and rebuild the document's results without any application-specific knowledge. An author whose research involves scientific computations on a UNIX computer can easily create reproducible documents. Beyond a traditional article and the application's source code, a reproducible document contains three additional components: (1) makefiles, (2) a small set of universal make rules (less than 100 lines), and (3) naming conventions for files.

A makefile contains the commands to build a software package and is a standard UNIX utility for software maintenance. More powerful than a simple script of commands, a makefile has a notion that result files (targets) are up to date when they are younger than their corresponding source files (dependencies). The make utility, designed to maintaining software, elegantly maintains reproducible research projects. Fortunately, fine tutorial books about the make language existOram and Talbott (1991); Stallman and McGrath (1991), and students easily begin using make even without formal introduction.

The second component in making scientific computations reproducible is the consistent availability of a small set of standard make rules. These rules allow a reader to interact with the document without knowing the underlying application-specific commands or files. The inclusion of an universal set of rules in every makefile ensures the rules' consistency across documents and enables authors to concentrate exclusively on the application-specific part of the makefile. Furthermore, a central set of rules accumulates the wisdom of a community on how to organize a reproducible electronic document. Our laboratory offers about 100 lines of GNU make code (ReDoc rules) that, when included in the application makefile, implement a simple but powerful reader interface. In the electronic version of any reproducible electronic document, these ReDoc rules facilitate four commands: make burn removes the result files (usually figures), make build recomputes them, make view displays the figures, and make clean removes any files that are neither source nor result files. The process of recomputing the author's results allows a reader to understand and to modify the interaction of the various components.

The third component of a reproducible document system is its naming conventions for all files. These conventions allow a community to formulate universal rules that recognize a file's type and handle it appropriately. For example, a cleaning rule removes all intermediate files. The rule identifies intermediate files based on a community's naming conventions: typically files with suffix .o (object files) or files with the name stem junk (temporary files) are removed. A cleaning rule is important since a cleaned directory is more accessible and inviting to a reader than a cluttered uncleaned one. Furthermore, a cleaning rule saves resources such as disk memory by removing superfluous files. Naming conventions are also needed for result files. The rules for displaying, removing, or recomputing a result file are based on the result's file name. For example, at our laboratory result files (which are invariably figures) can have various formats such as postscript or gif. Our laboratory's naming conventions require the author to indicate the result file's format by a suffix, such as .ps or .gif. Consequently our laboratory can supply a universal format-independent rule for displaying result files: The rule identifies the result file's suffix, concludes the file's format, and invokes the appropriate viewing program such as ghostview for postscript or xview for gif files.

ReDoc rules are easy to use. An author who already uses makefiles only needs to adhere to the ReDoc naming conventions and include the ReDoc rules to make a traditional document reproducible. Our laboratory distributes its ReDoc rules, this article, and the accompanying example on its World Wide Web siteSchwab and Claerbout (1996). A different community may need to adapt these ReDoc rules to its own peculiarities and naming conventions.

At our laboratory the software readily accessible to any researcher has increased tremendously. Today students commonly take up projects of former students, starting by easily removing and recomputing the original result files. Students who graduated and left our laboratory were able to seamlessly continue their own research at their new locations.

For example, in 1995, we successfully employed the ReDoc rules in our laboratory's sponsor report (14 articles by 15 authors) and three of Jon Claerbout's textbooks on seismic imaging. These documents contain a total of 483 result files: 276 easily reproducible, 21 conditionally reproducible, and 186 non-reproducible figures. Before publication automatic scripts removed and rebuilt all 276 easily reproducible result files (see Figure 1). We use the same scripts and documents to benchmark computer platforms. Additionally, our laboratory published 12 PhD theses that use an earlier version of the reader interface based on a dialect of make called cake. Before the web, we distributed these documents on CD-ROMs Claerbout (1996). Nowadays these documents are available on our web site, sepwww.stanford.edu.

col
Figure 1 Quality control. A concrete test of a document's reproducibility is a cycle of burning and rebuilding its results. A simple script can implement such a reproducibility test by invoking the ReDoc rules described in this article. The ReDoc rules remove and regenerate the document's results independent of the document's content. The graph above plots the successfully reproduced figures versus the series of tests that removed and rebuilt the figures. The document contained 14 articles with 112 easily reproducible figures by 15 authors. After each test the authors were given time for corrections. After the first test, only 60% of the document's easily reproducible figures were in fact reproducible. After the fourth test, almost all figures were reproducible and the document was published.

We chose to implement the current reader interface in GNU make, since it is platform-independent, excels in the efficient maintenance of even complex software packages, and is equipped with a special mechanism to handle intermediate files, which we will discuss later (see section Simultaneously clean and up to date). Conceptually the ReDoc reader interface is independent of the document format (TEX, HTML, etc.) and independent of the underlying computational software, such as Matlab, Mathematica, or C and FORTRAN programs. Even though this paper restricts itself to UNIX systems and the make utility, the concept of a reader interface to reproduce a document's computational results should apply to electronic documents in other computer environments as well.

What is next? Of course, we want to publish our results on the World Wide Web. The Web conveniently distributes the combination of reading material for researchers and software for computers. Ideally, each computed figure in a future World Wide Web document should be accompanied by a push-button for the burn, build, clean, and view command. Currently we are closely watching the development of JavaSUN (1996), a computer language for software on the Internet.

Next: Benefits to the reader Up: Schwab et al: Reproducible Previous: Schwab et al: Reproducible

Stanford Exploration Project
3/8/1999