next up previous print clean
Next: Global parameters Up: DESIGN Previous: Distributing and collecting

Parameter handling

One of the biggest problems with running on cheap hardware is the failure rate is high. Writing migration code that is able to figure out where it died and continue is challenging. When the migration is part of a larger inversion problem the tasking becomes even more difficult. One of the goals of the library is to make check-pointing easier. Any thread can write a status parameter to a distributed tag. This status parameter is written to its local sections rather than the global tag, so clobbering of the text file isn't an issue. Restarting becomes a much simpler matter. You can request the status parameter from each section with a single call. Figuring out what portion of job has finished, and what portion is remaining becomes a trivial matter.


next up previous print clean
Next: Global parameters Up: DESIGN Previous: Distributing and collecting
Stanford Exploration Project
5/23/2004