previous up next print clean
Next: CODING, TESTING AND PERFORMANCE Up: Rub: Singular value decomposition Previous: HESTENES' METHOD

PARALLEL IMPLEMENTATION ON THE CM

In the discussion to follow we assume that A has an even number of columns. If the number of columns is odd, we pad A with a column of zeroes. We use $\frac{n}{2} + 1$ processors to store an $m \times n$ matrix Ak, where each processor stores two columns of Ak. Following the explanation given in Golub and Van Loan (1989), for purposes of illustration we assume that A has 8 columns. Then we can store A in the following way :


\begin{picture}
(440,100)(0,12)
\put (60,70){
\framebox 
(30,30){P1}}
\put (120,...
 ...0,25){
\framebox 
(30,15){7}}
\put (240,25){
\framebox 
(30,15){8}}\end{picture}
where U1,U2 are 2-dimensional arrays, Pn is processor n and the numbers indicate columns of Ak. Then we can orthogonalize the four sets of two columns in parallel, using the method described in the previous section. After the columns are orthogonalized, we keep column 1 fixed and rotate the ring defined by (2 3 4 8 7 6 5), obtaining the following configuration :


\begin{picture}
(440,100)(0,12)
\put (60,70){
\framebox 
(30,30){P1}}
\put (120,...
 ...0,25){
\framebox 
(30,15){8}}
\put (240,25){
\framebox 
(30,15){4}}\end{picture}

\begin{picture}
(440,75)
\put(0,50){Illustrating the original layout of A as}
\p...
 ...ox 
(20,20){X}}
\put (110,10){
\framebox 
(20,20){X}}\end{picture}}\end{picture}

\begin{picture}
(440,15)
\put(0,5){the rotation is achieved in six steps on the CM :}\end{picture}

\begin{picture}
(440,30)
\put(15,15){\circle{15}}
\put (0,0){\makebox(30,30){1}}...
 ...0,0){\makebox(30,30){4}}
\put(245,0){U2 = cshift(U2,dim=2,shift=1)}\end{picture}

\begin{picture}
(440,100)
\put (90,95){\vector(0,-1){20}}
\put (20,50){\makebox(...
 ...ox 
(20,20){6}}
\put (110,10){
\framebox 
(20,20){4}}\end{picture}}\end{picture}

\begin{picture}
(440,30)
\put(15,15){\circle{15}}
\put (0,0){\makebox(30,30){2}}...
 ...t(220,0){\makebox(30,30){5}}
\put(245,0){U1(:,1:2) = buffer(:,1:2)}\end{picture}

\begin{picture}
(440,100)
\put (90,95){\vector(0,-1){20}}
\put (20,50){\makebox(...
 ...ox 
(20,20){6}}
\put (110,10){
\framebox 
(20,20){4}}\end{picture}}\end{picture}

\begin{picture}
(440,30)
\put(15,15){\circle{15}}
\put (0,0){\makebox(30,30){3}}...
 ...
\put(220,0){\makebox(30,30){6}}
\put(245,0){U2(:,4) = buffer(:,4)}\end{picture}

\begin{picture}
(440,100)
\put (90,95){\vector(0,-1){20}}
\put (20,50){\makebox(...
 ...ox 
(20,20){6}}
\put (110,10){
\framebox 
(20,20){4}}\end{picture}}\end{picture}

The rotation can be achieved in less than six steps. We chose this approach because the only data movement between processors is done efficiently with cshift instructions.

The resulting column pairs can then be orthogonalized in parallel. After seven rotations we have processed all the column pairs in A. In general, for an n column matrix, it takes n-1 steps to process all the column pairs. Each set of n-1 steps is referred as a sweep. Throughout the sweep we check the values of all $\lambda = \frac{\vert\gamma\vert}{\sqrt{\alpha \beta}}$,with $\alpha, \beta$ and $\gamma$ having been defined in the previous section. If $\lambda < \epsilon$ for a sufficiently small $\epsilon \gt 0$ holds for each set of $\alpha,\beta,\gamma$ in a sweep, then we can say that we have orthogonalized A and obtained W. We can also obtain V if we begin with an $n \times n$ identity matrix and at every step of the algorithm we process each Vk in a manner similar to each Ak.

In Brent and Luk (1985), it is stated that although the authors know of no example for which the algorithm fails to give convergence, convergence is not guaranteed. In order to enforce convergence, a threshold approach (Wilkinson, 1965) can be chosen. A threshold value can be associated with each sweep. Any pair of columns in that sweep, with $\frac{\vert\gamma\vert}{\sqrt{\alpha \beta}}$ below the threshold, is not processed.


previous up next print clean
Next: CODING, TESTING AND PERFORMANCE Up: Rub: Singular value decomposition Previous: HESTENES' METHOD
Stanford Exploration Project
12/18/1997