* Under Construction

Data Catalog

The first point of call for information on any SEP dataset is our Data catalog developed by Abdullah Altheyeb. You can find information on all the SEP data sets here.

An older version of the data catalog is available here.

Frequently asked questions

Where are the data sets and how can I assess them?

  • Currently all the SEP datasets are stored on the bricks* filesystem and can be assessed at /data on any SEP machine.
  • The /data directory is made up of bricks[1-4] which form our gluster parallel file system (currently with 16TB dis0k capacity). /data is connected to the lost cities and the moods through Infiniband.

What if the /data directory appears to be empty on an SEP machine??

  • Talk to the data admins.
  • If you have root assess, run 'sudo source /etc/rc.local'.

How do I convert from segy to SEP format?

  • Many of the existing data folders contain makerules for data conversion. The easiest way to is to follow the procedure in one of the old folders. A formal discusssion on converters is available in the SEPlib documentation .

What are my responsibilities when I receive a new data set?

  • You need to discuss with the data admins to ensure the data are properly copied into the data library.
  • You should create a documentation on the data using the data catalog . This document must show the restrictions on use of these data.
  • Have a data admin create a new Unix group specific to that set of data.
  • Once a user has signed the written agreement to the terms and restrictions, the data admin can then add that user to the group by running the command “gpasswd -a <username> <groupname>” on oas followed by “cd /var/yp && make”.

What should I do before using an existing data set?

  • Read the propriety information to ensure all usage/publication policies fit your intended plans.
  • You may consider reading previous reports/papers where such datasets have been used.

What should I do if I notice something wrong with a data binaries in /data?

  • Inform the data Admins or Biondo.

What should I do if I notice incorrect/missing information in the data catalog?

  • You can correct the information but do inform the data admins to ensure such corrections/updates are noted.

<note warning> Don'ts

  • Do not modify the original data set in the /data directory.
  • Do not create/transfer new data sets in the /data directory without informing the Data Admins.


Backup policies

  • Koko mirrors all of its information to an identical set of internal disks. The home directories are mirrored every night, while the personal devices are mirrored once a week. Remember that if you erase a file that you need, it will be removed from the mirror overnight while the script runs.

What do we backup and how often?

  • Home directories are backed up daily.
  • Everything on koko including your personal devices is backed up quarterly.

Where are the backup disks?

  • Nightly backups are stored in /net/koko/backup.
  • Check the “Backup schedule” section for location of specific backup disks. By default, they are sent to a location outside the Mitchell building.

How do I set up my own backup system?

  • Arrange your own backup system. Talk to Bob or Biondo if you need additional storage for this purpose.
  • It is also strongly recommended that you set up a mirroring script on your own machine. This can be done by taking the mirroring script shown at /net/koko/backup/mirror_personal.py, altering it for your home directory, personal device and desktop machine, and then adding it to the crontab on your desktop machine. You can add it to your crontab by typing the command “crontab -e” and inserting the following line:0 3 * * * python /net/koko/backup/mirror_personal.py »/dev/null where you change the path of the python mirroring script and the time (currently set for nightly at 3AM) to what best suits you. Again, the important thing to realize is that you should not trust any SEP backup measures, and should have at least one failsafe that is your own responsibility.

Backup schedule


Schedule Drive location Notes
December 2008 Data Cabinet (Rm 452) Six 1-TB disks
December 2009 -
December 2010 -
December 2011 -
December 2012 -


Schedule Drive location Notes
October 2008 Biondo's
May 2009 -
January 2010 -
April 2010 -
July 2010 -
October 2010 -
January 2011 -
April 2011 -
July 2011 -
October 2011 -
January 2012 -
April 2012 -
July 2012 -
October 2012 -


Data Admin I : Gboyega Ayeni
Data Admin II : Xukai Shen
Data Admin III: Elita Li

sep/internal/datalib.txt · Last modified: 2017/07/20 18:27 by stew
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0