CD-ROM versus The Web

by Jon Claerbout, Mathias Schwab,
and Martin Karrenbach

in SEP Report 83 in 1994.


CD-ROM technology has remained static from 1990 till now while web browsing technology has exploded. Economics now favors the web. SEP has learned how to deliver reproducible research on CD-ROM but not yet on the web.


INTRODUCTION

The Stanford Exploration Project (SEP) has been producing research reports on CD-ROM since 1991. Now it is time to reassess this effort and this medium.

Purpose

The purpose of our CD-ROM publication effort is to provide reproducibile research and technology transfer to

It may seem strange to put the author's own name at the top of the list to whom we wish to provide the reproducible research, but it often seems that the greatest beneficiary of preparing the work in a reproducible form is the original author! Authors lose reproducibility of their work by the passage of time exacerbated by their departure from Stanford.

The CD/reproducibility effort has two parts: The first part is defining and learning the standards for reproducible research. The second part is assembling and building the CD-ROM. These efforts are independent. With the system we have in place, the researcher does almost none of the CD-ROM work beyond the work required to prepare reproducible research and a paper document. The researcher only copies his/her work to a staging area.

Space and ``read-only'' limitations may also create a few extra minor chores. The CD-ROM being ``read only'' motivates enforcing the expectation (which is good practice anyway) that researchers have in place a rule for removing all ``intermediate'' files, namely, those files that are created by programs from other files. This clean up is required for the CD-ROM to enable push-button reproducibility. It also makes revisiting an old directory less onerous for humans. The required existance of a makefile with a clean rule for intermediates and a burn rule for targets defines the test for reproducible research.

ACCEPTANCE

We expected widespread adoption of CD-ROM in the world of PC, Mac, and UNIX. We were wrong for UNIX.

We have had problems with our CD-ROMs not working on many systems. Our CD-ROMs have source code, documents, a few data sets, fonts, word processing software, and binary programs that we hoped would work on popular computers as well as all workstations manufactured by our sponsoring companies. The problems that we have had include

ECONOMICS

The economics of CD-ROM useage have improved in the sense that CD-ROM readers have dropped in price from about a thousand dollars to a few hundred dollars. The economics of CD-ROM production have not improved for us. We still consider a CD-ROM report to be a $2000 event, about the same as our paper report production event. We typically spend a couple hundred dollars on one-offs. We have some licensed software for making UNIX CD-ROMs. The cost of mastering is about a thousand dollars. We make 250 disks for about $350 dollars, and have printing costs for the instruction folder. A skilled computer person will also take more than a week testing.

We might be able to cut costs by not mastering the CD-ROMs but instead making many singles. We could get by with about 45 singles (one for each sponsor and one for each researcher). I will hope about 50 singles would cost about $40/each for about $2000. That assumes we do not buy equipment but continue to use outside suppliers. Lower single incremental costs would apply if we purchased equipment at about $2-3000, and UNIX software estimated at $5-10,000.

PROSPECTS FOR WWW REPLACING CD-ROM

The World Wide Web (WWW) coupled with bargain prices for hard disks offers promise of an alternate approach. We recently considered purchase of a large CD-ROM jukebox. On comparing its price to that of hard disk, we found them comparable, with the hard disk offering a much clearer future upgrade path.

For $2500 we could purchase a 9Gb disk and half fill it with our 10 previous CD-ROMs. Many machine dependent binaries and fonts could be thrown away, meaning the cost for keeping this info on the web would be about $1500 for the lifetime of the disk. The disk should be made read only. Material could be partitioned into public areas and restricted areas. We should probably do this anyway, regardless of our CD-ROM plans.

People would not be able to directly do reproducible work on our archival read-only disk file system. They would need to make copies and work from the copies. Naturally, the difficult chore of building a complete environment, as done on our CD-ROM needs to be done if reproducibility is to be achieved. We have not demonstrated a prototype for such local reproducibility but it seems possible. For people at remote sites to be able to build the required environment is even more problematic but not an unrealistic goal.

My goal is for people who have graduated and left Stanford for a couple of years to be able to rebuild the illustrations in their thesis. Turns out that by the time you reach that goal you have approximately met the goal of providing reproducibility for colleagues and sponsors and the general public. The University long-range position is that the research results should be reported in such a way that the work is reproducible by others. SEP is the first place where we have interpreted this quite literally.

We have had sponsors request web serving in preference to the CD-ROM. We don't know how many sponsors have web access but we are certain that some do not. In a few years we expect that most will have web service and we can supply the remaining with magnetic tape.

CONCLUSION

We plan to continue making CD-ROM theses and reports until we learn how to deliver reproducible research on the Web.