210 likes | 512 Views
Similarities in the Structure Prediction of Sparse QR Factorization and Sparse LU Factorization with Partial Pivoting. Laura Grigori INRIA – FRANCE Joint work with John Gilbert, UCSB. Talk Outline. Introduction Sparse direct methods. Combinatorial tools
E N D
Similarities in the Structure Prediction ofSparse QR Factorization and Sparse LU Factorization with Partial Pivoting Laura Grigori INRIA – FRANCE Joint work with John Gilbert, UCSB
Talk Outline • Introduction • Sparse direct methods. • Combinatorial tools • Directed graphs (expose path structure in Cholesky and LU factorization). • Bipartite graphs (expose path structure in QR and LU factorization with partial pivoting). • Structure prediction • Consider reducible matrices. • Present tight bounds for the factors Q, R. • Identify tight bounds for the factors L, U of PA=LU. • Conclusions and future work • Algorithms, experiments.
U 1 2 3 4 L 5 6 7 Sparse LU Factorization for i = 1 to n-1 for j = i+1 to n A(j,i) = A(j,i)/A(i,i) for k = i+1 to n st A(i,k) != 0 for j = i+1 to n st A(j,i) != 0 A(j,k) = A(j,k) - A(j,i) * A(i,k) • Order equations and variables. • Structure prediction and symbolic factorization: Identify nonzero structure of L, U factors. • Numeric factorization (most time consuming): Use supernodes to exploit memory hierarchy. Benefit from additional parallelism due to sparsity. • Triangular solves + iterative refinement.
Sparse QR factorization by Householder transformations • Apply orthogonal transformations to annihilate subdiagonal entries • Householder transformation • Traditional Householder algorithm eliminates columns of A from left to right (right-looking) using one Householder transformation for each column.
Sparse Direct Methods • The fill-in is more important for QR than for LU. • Since , the QR factorization of A is related to the Cholesky factorization of . • The structure of Q, R (for A=QR factorization) and L, U (for A = LU factorization) can be determined before the numerical computation. • For PA = LU, nonzero structure of L and U is determined during the numerical factorization. • A structure prediction phase allows to identify bounds of L and U. • Studied by George, Gilbert, Hare, Johnson, Liu, Ng, Olesky, Pothen … for LU, matrices are irreducible (strong Hall property). • Our contribution: identified tight bounds for matrices that are reducible (Hall property).
1 2 3 4 1’ 2’ 3’ 4’ A 2 1 3 4 G(A) Directed graphs, bipartite graphs, matchings • A is n x n sparse, unsymmetric matrix. Directed graph G (A): - n vertices. - edge (i,j) for each nonzero Aij. • A is m x n sparse, unsymmetric matrix. Bipartite graph H(A): - m row vertices and n column vertices. - edge (r,c) for each nonzero Arc. Matching M on H(A): - set of edges, no two of which are adjacent. - M is a column complete matching if it has n edges. - M is a perfect matching if it has m=n edges. 1’ 1 2’ 2 3’ 3 4’ 4 H(A)
Hall property • A is Hall if every n-by-k submatrix (1 k n) has at least k nonzero rows. • A is strong Hall if every n-by-k submatrix (1 k n) has at least k+1 nonzero rows. • Facts: • Every nonsingular matrix is a Hall matrix. • A Hall matrix is reducible, while a strong Hall matrix is irreducible. • The rows (or columns) of a Hall matrix can be permuted so that the permuted matrix has a zero-free diagonal. • A bipartite graph has a column-complete matching if and only if it has the Hall property. • Coleman/Edenbrandt/Gilbert (‘86).
Results in Structure Prediction A – strong Hall matrix - Filled column intersection graph G(A): symbolic Cholesky factor of ATA; G(U) G(A) and G(L) G(A) bound is best possible for the structure of U and R. A – Hall matrix Use Dulmage-Mendelsohn to decompose A into strong Hall submatrices. Structure prediction for Hall matrices. Tight prediction for Q and R [Hare, Johnson, Olesky, van den Driessche, Pothen] -It identifies zeros forced by the orthogonality required in Q. Tight bounds for L and U of PA = LU [Grigori, Gilbert] -It identifies numerical cancellations, related to submatrices of A that are structurally singular and that force entries in L and U to be zero.
A 4 Hall Sets • A is a m x n Hall matrix with m ≥ n. Definition Hall Set [Hare et al. 93]: k columns form a Hall set if they have colectively nonzeros in exactly k rows. Notation: Sw = Hall set of maximum cardinality in the first w columns of A. sw = row vertices covered by columns in Sw. Lemma: Let S be a Hall set of A, and M be a column complete matching of H(A). Then each column vertex c of S is matched by M to a row vertex r’ of s. 1 2 3 4 5 6 Example: A4= A[1’:6’][1:4] Sw={3}, sw={3’} H(A4)-{3} is a strong Hall graph. M={(5’1),(6’2),(3’3),(2’4)} 1’ 1’ 1 2’ 2’ 2 3’ 3’ 3 4’ 4’ 4 5’ 5’ 6’ 6’
1 2 3 4 5 6 1’ 2’ 3’ 4’ 5’ 6’ A Sparse QR Factorization [Hare et al.] (1) Suppose:-A is Hall matrix factored as A=QR. Identify zeros forced by the orthogonality required in Q. Rely on two facts: • Qj span{A1, . . . , Aj} and • Qj is orthogonal to A1, . . . , Aj-1. Example: fill element. bogus fill. 1 2 3 4 5 6 1 2 3 4 5 6 1’ 1’ 2’ 2’ 3’ 3’ 4’ 4’ 5’ 5’ 6’ 6’ Q R
1 2 3 4 5 6 1’ 2’ 3’ 4’ 5’ 6’ A Sparse QR Factorization [Hare et al.] (2) Suppose:the QR factorization arrives at computation of column c of Q. Result: H(A) contains a path T[r’:c] with vertices in Sc-1 υ sc-1 => Qrc = 0 regardless the nonzero values of A. Example: S2 υ s2= {2 } υ { 2’ } => Q23 = 0. - Qj is orthogonal to every Ah Sc-1. - The columns Ah[sc-1] of Sc-1 span a subspace of dimension |sc-1|. =>Qj[sc-1] is orthogonal to every vector in the above subspace, and so must be zero. 1 2 3 4 5 6 1 2 3 4 5 6 1’ 1’ 2’ 2’ 3’ 3’ 4’ 4’ 5’ 5’ 6’ 6’ Q R
Sparse QR Factorization [Hare et al.] (3) Structure prediction for Q: Theorem:H is graph of a Hall matrix and there is a path { r’,c1, …, r’t ,c} with intermediate column vertices in {1,..,c-1} and no vertex in Sc-1 υ sc-1 => there is A with H(A) = H and of rank n such that A = QR and Qrc≠ 0. Structure prediction for R: Theorem: H is graph of a Hall matrix and the structure of Q is determined as above. => the structure of R is given by the upper triangular part of .
H(A2) H(B) 1’ 1 1’ 1 2’ 2 2’ 2 6’ 3 Characterization of fill during LU elimination Suppose:-A is Hall matrix factored as PA=LU. Result: H(A) contains a path T[r’:c] with intermediate vertices in Sc-1 υ sc-1 => for any permutation P with r’ permuted in position q’, Lqc = 0. Example: There is a path T[6’:3] = { 6’,1,1’,3 } with vertex in S2 υ s2= {2 } υ { 1’ } => L63 = 0 regardless the nonzero values of A. 1 2 3 4 5 6 A2= A[1’:2’][1:2] B = A[1’,2’,6’][1:3] L63 = det B / det A2 B is structurally singular =>L63 = 0 1’ 2’ 3’ 4’ 5’ 6’ A
Characterization of fill during LU elimination (cont.) • A2 is nonsingular => there is a perfect matching M2 in A2. Proof by contradiction: suppose L63 ≠ 0 • B is nonsingular => there is a perfect matching MB in B. • 6’, 3 are vertices matched by MB, but not matched by M2. Trace a path from 6’ to 3 as follows: 1. i = 0, r’0 = 6’. 2. (r’i , ci) MB, and r’i s2, then ciS2. 3. stop if ci = 5. 4. (ci, r’i+1) M2 and ciS2 , then r’i+1 s2. 5. i = i + 1 and go to 2. Contradiction The path traced has no intermediate vertices in S2 υ s2= {2 } υ { 1’ } => 1 2 3 4 5 6 1’ 2’ H(A2) H(B) 3’ 1’ 1 1’ 1 4’ 2’ 2 2’ 2 5’ 6’ 3 6’ A
1 2 3 4 5 6 7 1’ 2’ 3’ 4’ 5’ 6’ 7’ Upper bound for nonzero structure of L Theorem: H is graph of a Hall matrix with nonzero diagonal and there is a path { r’,c1, …, r’t ,c} with intermediate column vertices in {1,..,c-1} and no vertex in Sc-1 υ sc-1 => there is P and A such that r’ is permuted in some position q≥c and Lqc≠ 0. Example: Path T={7’,4,2’,1,5’,5}, S4 = {3}, s4 = {3’} Find P and A such that 7’ is permuted in position q and Lq5≠ 0. Approach: Build a matching M that defines a permutation P. 1’ 1 2’ 2 3’ 3 4’ 4 5’ 5 6’ 7’ H(A)
1 2 3 4 5 6 7 5’ 2’ 3’ 4’ 1’ 6’ 7’ Upper bound for nonzero structure of L (cont.) Choose values of A: nonzeros corresponding to edges of M to be larger than n, all other to be between 0 and 1, all Hall square submatrices to be nonsingular. =>Pivots during the first 4 steps of elimination correspond to edges of M. PA[1’:4’][1:4] is Hall matrix - matching M. PA[1’:4’,7’][1:5] is Hall matrix – use M and the path T to obtain a perfect matching. => Element L75≠ 0. 1’ 1 1’ 1 2’ 2 2’ 2 3’ 3 3’ 3 4’ 4 4’ 4 5 5’ 5’ 6’ 6’ 7’ 7’ H(A) H(A)
Upper bound for nonzero structure of U Theorem: H is graph of a Hall matrix with nonzero diagonal, w<c two column vtcs. there is a path { r’,c1, …, r’t ,c} with intermediate column vertices in {1,..,w-1} and no vertex in Sw-1 υ sw-1 => there is P and A such that r’ is permuted in some position s ≤ c and Usc≠0. Example:S1 = S2 = ø, S3 = S4 = {2,3}. Path {5’,1,2’,4} in H(A) and w=3 satisfy the conditions => There is P and A such that 5’ is permuted in position 3’ and U34≠0. 1 2 3 4 5 1 2 3 4 5 1’ 2’ 1’ 1 2’ 3’ 2’ 2 3’ 5’ 3’ 3 4’ 4’ 4’ 4 5’ 1’ 5’ 5 A H(A) PA
Algorithms • Similarities with symbolic QR for Hall matrices [Hare et al. 93, Pothen]. • Compute Hall sets of maximum cardinality Sj ,sj, for 1 ≤ j ≤ n-1. - complexity dominated by maximal transversal [Hare et al.]. • Use Hall sets to compute fill during LU with partial pivoting: - to be investigated. • Use Hall sets to compute upper bounds for the factors L and U: - complexity , where h is the number of distinct (nonempty) maximal Hall sets Sj , 1 ≤ j ≤ n-1 [Hare et al.].
Conclusions Results: • Characterized fill for the LU factorization with partial pivoting. • Determined tight exact bounds for the structure of L and U. Future work: • Study algorithms for the efficient computation of the bounds. • Compute the number of numerical cancellations, improvements in memory bounds and dependencies estimations. • Compare with decomposing A into Strong Hall submatrices.
Structure Prediction for L and U Upper bound for L: Theorem:H is graph of a Hall matrix with nonzero diagonal and there is a path { r’,c1, …, r’t ,c} with intermediate column vertices in {1,..,c-1} and no vertex in Sc-1 υ sc-1 • there is P and A such that r’ is permuted in some position q≥c and Lqc≠ 0. - The nonzero structure of PQ is a tight bound for the structure of L. Upper bound for U: Theorem:H is graph of a Hall matrix with nonzero diagonal, w<c two column vertices and there is a path { r’,c1, …, r’t ,c} with intermediate column vertices in {1,..,w-1} and no vertex in Sw-1 υ sw-1 => there is P and A such that r’ is permuted in some position s ≤ c and Usc≠0. - The nonzero structure of R is a tight bound for the nonzero structure of U.