Disk storage

Next: BUILDING AND SETUP Up: DESIGNING Previous: PGI vs. Intel

Disk storage

Our problems tend to be not only computationally intensive but also large. As a result, disk space had to be worked into our design. The two most common approaches are either to put significant disk on each node and then create a virtual filesystem, using somthing like the Parallel Virtual File System (PVFS) or creating a large disk server with a high speed connection to the cluster. Both approaches have drawbacks. The PVFS approach is dangerous because it means relying on fairly immature software. In addition, unless we wanted to use a Redundant Array of Inexpensive Disk (RAID) greater than zero (meaning we would have to mirror our data), we could run into problems whenever a node became unavailable. The disk server approach also has drawbacks. Many of our applications have significant IO requirements and a gigabit connection still has latency issues.

We decided to follow a model that allows the greatest flexibility in programming styles. Our solution was a combinitation of two approaches. We put in a large case:

two dual process 1.2 GHz Athlon boards
a gigabit network card
a 100 megabit network card
IDE raid control cards with 2.4 terabytes of disk.

The total cost of this unit was less than $9000. In addition, on each node we put a 60 gigabyte disk. For jobs that have significant IO requirements the local disk can be used. For large initial and final data, the disk server is available with a high speed connection.

Next: BUILDING AND SETUP Up: DESIGNING Previous: PGI vs. Intel

Stanford Exploration Project
6/8/2002