next up previous print clean
Next: BUILDING AND SETUP Up: DESIGNING Previous: PGI vs. Intel

Disk storage

Our problems tend to be not only computationally intensive but also large. As a result, disk space had to be worked into our design. The two most common approaches are either to put significant disk on each node and then create a virtual filesystem, using somthing like the Parallel Virtual File System (PVFS) [*] or creating a large disk server with a high speed connection to the cluster. Both approaches have drawbacks. The PVFS approach is dangerous because it means relying on fairly immature software. In addition, unless we wanted to use a Redundant Array of Inexpensive Disk (RAID) greater than zero (meaning we would have to mirror our data), we could run into problems whenever a node became unavailable. The disk server approach also has drawbacks. Many of our applications have significant IO requirements and a gigabit connection still has latency issues.

We decided to follow a model that allows the greatest flexibility in programming styles. Our solution was a combinitation of two approaches. We put in a large case:

The total cost of this unit was less than $9000. In addition, on each node we put a 60 gigabyte disk. For jobs that have significant IO requirements the local disk can be used. For large initial and final data, the disk server is available with a high speed connection.


next up previous print clean
Next: BUILDING AND SETUP Up: DESIGNING Previous: PGI vs. Intel
Stanford Exploration Project
6/7/2002