The ARPACK interface requires the user to supply the subroutine that does the matrix-vector multiplication, The matrix-vector multiplication is the most expensive portion of calculating the eignvector, and therefore the obvious target for parallelzation. We implement the eignvector calculation in a modified master-slave scheme.
The slave nodes are intialized with a portion of the off-diagonal elements of the matrix. The master node is given a vector by the ARPACK library. It sends that vector to the first slave node and then begins to create the output vector by multiplying the diagonal terms of the matrix. Upon receiving the input vector the first slave node, it passes the vector the second slave node, and then begins multiplying the input vector by its portion of the off-diagonal terms creating its own output vector. This process is repeated by all of the slave nodes. The master node upon finishing multiplying the diagonal terms passes the output vector to the first slave node. It adds it to its output vector, and passes it to the second slave node. The process is repeated until the last slave node, which passes the completed matrix multiplication to the master node.
By implementing the matrix multiplication in this form, a good level of load balancing, and minimal communication wait time is acheived. We ran the approach on a Infinband network and were able to do 200 iteration of 2 billion non-zero matrix elements in 55 minutes.