SEP Data Catalog

SEP Data Catalog

The motivation for the SEP Data Catalog project is the long recognized need for book keeping of everything related to the data-sets that we have in our data library. The SEP data catalog is a database that is meant to help SEP researcher to find the meta data of all the data-sets in our data library. The catalog will also help making the job of data maintainer easier.

What's new

  1. We are moving from a log format, where all the meta data for each particular data set are listed in a single text/latex file, to a relational databases.
  2. We try to relate our publications to the detests used in our research. This will help us find all the papers that use any particular data-set in our library.

How does it work?!

This web application is based on two processes that run on the host machine (sep2). The first one is Apache (apache2) web server. The mod_python is installed on top of Apache to run python web pages. The second process is the database management system (DBMS) which contains the database design and the meta data. MySQL is the DBMS used for this project. The python module MySQLdb is used to communicate with the database system.

What is a database management system

Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke is a very popular book that is used as text for courses about databases. The few class notes below should be sufficient to give an understanding on how DBMS work and how to use them:

Preparing the host machine

On the host machine install the following packages ( use a a package installer sudo apt-get install packagename on Ubuntu or yum install packagename on centOS):

  • apache2 or httpd
  • mysql
  • mysql-server
  • python
  • MySQL-python

start the apache and sql servers:

sudo /etc/init.d/apache2 start 

for apache2, or

sudo /etc/init.d/httpd start 

for httpd web server.

sudo /etc/init.d/mysqld start

. In httpd.conf include

<Directory /web/html/sepdata>
  Options Indexes FollowSymLinks MultiViews
  AllowOverride None
  Order allow,deny
  allow from all
  AddHandler mod_python .py
  PythonHandler mod_python.publisher
  PythonDebug On
</Directory>

In /web/http/sepdata directory add .htaccess to limit access to SEP whitelist.

Backing up the database

mysqldump -u [username] -p [password] [databasename] > [backupfile.sql]

  • [username] - this is your database username
  • [password] - this is the password for your database
  • [databasename] - the name of your database
  • [backupfile.sql] - the file to which the backup should be written.

for more information goto Backing Up and Restoring Your MySQL Database

Progress

TODO

  • Finish entering metadata into the database.
  • Implement an alert and backup monitoring system.

DONE

  • Create a database entry for every dataset under /data
  • Entering the meta data for few datasets inherited from the old data library. One goal is to familiarize data catalog and data library maintainers with the database and the website. Another goal is to see if the current design need new entities or fields to be added.[Abdul, Gboyega, Xukai, Ellita]
  • Move the database and webiste to sep2 and limit access to local machines (and the host machine for mysql). [Abdul]
sep/internal/sepdata.txt · Last modified: 2015/05/27 02:06 (external edit)
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0