Table of Contents
SEP Data Catalog
The motivation for the SEP Data Catalog project is the long recognized need for book keeping of everything related to the data-sets that we have in our data library. The SEP data catalog is a database that is meant to help SEP researcher to find the meta data of all the data-sets in our data library. The catalog will also help making the job of data maintainer easier.
What's new
- We are moving from a log format, where all the meta data for each particular data set are listed in a single text/latex file, to a relational databases.
- We try to relate our publications to the detests used in our research. This will help us find all the papers that use any particular data-set in our library.
How does it work?!
This web application is based on two processes that run on the host machine (sep2). The first one is Apache (apache2) web server. The mod_python is installed on top of Apache to run python web pages. The second process is the database management system (DBMS) which contains the database design and the meta data. MySQL is the DBMS used for this project. The python module MySQLdb is used to communicate with the database system.
What is a database management system
Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke is a very popular book that is used as text for courses about databases. The few class notes below should be sufficient to give an understanding on how DBMS work and how to use them:
Preparing the host machine
On the host machine install the following packages ( use a a package installer sudo apt-get install packagename
on Ubuntu or yum install packagename
on centOS):
- apache2 or httpd
- mysql
- mysql-server
- python
- MySQL-python
start the apache and sql servers:
sudo /etc/init.d/apache2 start
for apache2, or
sudo /etc/init.d/httpd start
for httpd web server.
sudo /etc/init.d/mysqld start
. In httpd.conf include
<Directory /web/html/sepdata> Options Indexes FollowSymLinks MultiViews AllowOverride None Order allow,deny allow from all AddHandler mod_python .py PythonHandler mod_python.publisher PythonDebug On </Directory>
In /web/http/sepdata
directory add .htaccess to limit access to SEP whitelist.
Backing up the database
mysqldump -u [username] -p [password] [databasename] > [backupfile.sql]
[username]
- this is your database username[password]
- this is the password for your database[databasename]
- the name of your database[backupfile.sql]
- the file to which the backup should be written.
for more information goto Backing Up and Restoring Your MySQL Database
Progress
TODO
- Finish entering metadata into the database.
- Implement an alert and backup monitoring system.
DONE
- Create a database entry for every dataset under /data
- Entering the meta data for few datasets inherited from the old data library. One goal is to familiarize data catalog and data library maintainers with the database and the website. Another goal is to see if the current design need new entities or fields to be added.[Abdul, Gboyega, Xukai, Ellita]
- Move the database and webiste to sep2 and limit access to local machines (and the host machine for mysql). [Abdul]