220 likes | 366 Views
Gabriel cramer (1704-1752). A Condensation-based Low Communication Linear Systems Solver Utilizing Cramer's Rule. Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science The University of Tennessee. Outline. Motivation & problem statement Algorithm review
E N D
Gabriel cramer (1704-1752) A Condensation-based Low Communication Linear Systems Solver Utilizing Cramer's Rule Ken Habgood, ItamarArelDepartment of Electrical Engineering & Computer ScienceThe University of Tennessee
Outline • Motivation & problem statement • Algorithm review • Numerical accuracy & stability • Parallel Implementation • Communication Results Source: http://tridane.faculty.asu.edu
Introduction • Mainstream approach: Gaussian Elimination • e.g. LU decomposition • Looking for a lower communication overhead, efficient parallel solver • Targeting an unpopular approach: Cramer’s Rule
LU Communication Pattern Communication for distributed LU decomposition L00 U00 U01 U02 L10 A11 A12 L20 A21 A22 • Three sequential steps • Top left computes and sends • Row and column leads compute and send • Remaining processors factorize their blocks • One-to-one communication • Idle time while leads processing Source: http://www.caam.rice.edu/~timwar/MA471F03/
Outline • Motivation & problem statement • Algorithm review • Numerical accuracy & stability • Parallel Implementation • Communication Results Source: http://tridane.faculty.asu.edu
Matrix “Mirroring” • Mirroring example • Applying Chio’s condensation yields:
Outline • Motivation & problem statement • Algorithm review • Numerical accuracy & stability • Parallel Implementation • Communication Results Source: http://tridane.faculty.asu.edu
Accuracy and Numerical Stability • Backward error estimation • Theoretical estimate of rounding error • E matrix depends on two items • The largest element in A or b • The growth factor of the algorithm • Same growth factor as LU-decomposition with partial pivoting
Outline • Motivation & problem statement • Algorithm review • Numerical accuracy & stability • Parallel Implementation • Communication Results Source: http://tridane.faculty.asu.edu
Serial Performance Results support the theoretical ~2.5x complexity ratio
Communication Complexity • Two phases of parallel communication • Parallel Chio’s • Gather Columns • Overall Bandwidth N: Original matrix size, P: number of processors, F: gather columns size
Where’s the Breakeven Point? • Point at which Communication “dead time” matches computational workload • Assuming dC = .05 and N = 1000, the breakeven processors point would be P~142
Closing Thoughts … • Proposed O(N3) Cramer’s Rule method • Significantly lower communications overhead • Many more “broadcasts” than “unicasts” • Comm. function of problem size not processors • Next steps … • Optimize parallel implementation • Spare matrix version