SEP report reproducibility test guideline

recent updates

Start from sep150, we introduced an independent test account 'rep' on SEP-owned servers, this account has all basic environment settings. In addition to testing with your own account, you must test your report using 'rep' account (following the guideline and items in the scoresheet) before you notify me for testing. This will greatly reduce my burden and make you appreciate my efforts a little more :) Unfortunately, due to the university's IT security regulation on CEES computers, we cannot adopt this practice on CEES environment.

Requirement for repro test

If you have more complex reproducing strategies rather than typing make, then you need to include a readme file or put the instructions at the top of make file as comment.

The quality of reproducibility will be evaluated in five aspects:

1. All Code compiles on SEP computing environment

2. flow for CR figures is present in makefile

3. ER figures can be reproduced

4. Makefile is written in a clear/concise/neat manner

5. CR figures can be reproduced

All requirements except the last two are mandatory(everyone needs to pass in order to pass the test). Also we will check if proper reproducibility label are assigned to each figures.

In addition to that, you should also have the following

1. Have a backup folder for the figures in the paper (./Fig_bkup), this is for backup purpose, but also enables other user can compare the computation result with that in the paper.

2. Have your input/raw data files stored in a designated folder, not temporary scratch folder because they will be gone at some time, you have the following choices:

- If the file is small (let's say <10MB), and does not involve copyright issues, you can just put it in the report folder, using the out=stdout trick or store the .H@ file in the same folder with the .H file.

- If it comes from data in our data library(/data or /data1), then you are expected to write down the rules in your makefile that extracts the portion of the data you use from our data library.

- If the previous two options do not apply to your case, then you should put your data files in /data1/wrk/sep1XX/${your_paper_name}, remember again keep the binaries in the same folder.

makefile format

It is not rare that your makefile will have dozens of entries. We are not able to check all of these. Therefore, we require the makefile to have several common targets, i.e (The exact forms might be different, but nonetheless it should be self-explanatory otherwise, your need to provide instructions to make it easy to follow)

1. make burn; burn all the figures in the paper.

2. make clean; burn all figures, and also clean the executables and the intermediate result file (.H).

3. make EXE; build all executables used for this report

4. make NR; build NR figures; (this could be a simple copy from a backup folder).

5. make ER; build ER figures

6. make CR; build CR figures

Again, you can back up your figures to a separate folder, but it is not OK to just copy them back to directory Fig in those figures-making rules.

Reproducibility file organization on CEES computers

People using CEES computers for their computation usually involves PBS and python scripts, etc. Putting the entire workflow into a makefile is sometimes a difficult task. Nonetheless, you can still organize your reproducibility workflow around the Makefile, and the criterion for the reproducibility does not change, i.e. the code should compile, the ER flow is present and can be generated successfully by a tester, the CR flow is present.

So In the makefile, you should still have complete flows for each figure in your paper. If computing a figure involves running a script, you should record the cmd the runs the script in the makefile. You can add some comments for better readability as well.


1. What is the rule of thumb of choosing NR/ER/CR labels for a plot?

ER denotes Easily Reproducible and are the results of processing described in the paper. The author claims that you can reproduce such a figure from the programs, parameters, and makefiles included in the electronic document. The data must either be included in the electronic distribution, be easily available to all researchers (e.g., SEG-EAGE data sets), or be available in the SEP data library2. We assume you have a UNIX workstation with Fortran, Fortran90, C, C++, X-Windows system and the software downloadable from our website (SEP makerules, SEPlib, and the SEP latex package), or other free software such as SU. Before the publication of the electronic document, someone other than the author tests the author’s claim by destroying and rebuilding all ER figures. Some ER figures may not be reproducible by outsiders because they depend on data sets that are too large to distribute, or data that we do not have permission to redistribute but are in the SEP data library.

CR denotes Conditional Reproducibility. The author certifies that the commands are in place to reproduce the figure if certain resources are available. The primary reasons for the CR designation is that the processing requires 20 minutes or more, MPI or CUDA based code should be labeled CR.

NR denotes Non-Reproducible figures. SEP discourages authors from flagging their figures as NR except for figures that are used solely for motivation, comparison, or illustration of the theory, such as: artist drawings, scannings, or figures taken from SEP reports not by the authors or from non-SEP publications.

Please refer to the up-to-date instructions at vostok:/wrk/sep1xx/Adm/preface.tex

2. Do I need to include all the .H files I used in the figures in the local directory?

You are not forbidden from doing that just for backup purpose. However, your makefile flows for building these figures should start from the raw input data file, perform the computations and then generate the figures. The only exception is plotting the raw data, since no computation needs to be performed. Simply grabbing backed-up intermediate results (.H files) and plotting them is not considered a valid reproducible flow, unless it is NR.

3. My makefile rules take a long time to compute, how can I test them more efficiently?

Use the '-n' option in Make. This option allows the user to see all the commands that GNUmake would execute for this target, but without actually executing them. For example, to test target.H, enter make -n target.H rather than make target.H. The terminal will display the commands that would be run as well as report any mistakes you may have made, such as a filename typo in the dependency list.

Common Dos and Don'ts

1. Unless intended, DO NOT set the permission of any of your files to be “r w x - - - - - - ”, this would block anybody else's access.


1. put a dash '-' at the beginning of the command, if you want Make to ignore the encountered error and continue the flow. For eg. (remove the double quotes below):

    "-" mkdir bin
    make bin/exe.x

2. Use make '-t' option to tell gmake that certain target with all its dependencies are up-to-date, thus prevent gmake from regenerating them by accident. Reversely, use make '-B' to force gmake to rebuild certain target from scratch.

3. One suggested practice is to put the file names of your reproducibility figures as the dependencies for the 'ER' and 'CR' target. i.e. :

ER: $R/fig1.pdf $R/fig2.pdf

is much better than:

  make $R/fig1.pdf 
  make $R/fig2.pdf

Because in the latter case, GMake is able to check the dependencies tree for each figure.

wiki/reproguide.txt · Last modified: 2015/05/27 02:06 (external edit)
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0