310 likes | 400 Views
Solving Irregular Problems Through Parallel Irregular Trees. Fabrizio Baiardi P aolo Mori Laura Ricci Dipartimento di Informatica Università di Pisa Istituto di Informatica e Telematica CNR - Pisa. Outline. Irregular problems main features Hierarchical representation of the domain
E N D
Solving Irregular Problems Through Parallel Irregular Trees Fabrizio Baiardi Paolo Mori Laura Ricci Dipartimento di Informatica Università di Pisa Istituto di Informatica e Telematica CNR - Pisa
Outline • Irregular problems main features • Hierarchical representation of the domain • Parallel Irregular Tree library • Experimental results • Future works PDCN 2005
Irregular Problems • the domain includes a set of elements characterised by • the position in the domain • other problem specific properties • the elements distribution is • non-homogeneous • dynamic and non-predictable • the evolution of an element • depends upon that of other elements (locality) • updates the element properties • Examples • Barnes Hut • Adaptive Multigrid Methods • Radiosity methods PDCN 2005
Hierarchical Representation • the domain is recursively partitioned into a set of spaces by applying a a problem dependent condition • the Hierarchical Tree represents the decomposition and each Hnode represents either a space or an element PDCN 2005
Distributed Hierarchical Tree Htree representation distributed among the p-nodes pt = <{h0,..hn-1}, mHt> • private Htree (pHt): subtree assigned to a p-node • mapping Htree (mHt): represents the hierarchical relations among the private Htrees ( ) h0 PDCN 2005 h1 h3 h2
PIT Library defines: • PITree • PIT operations • key point: both the sequential and the parallel versions of the application are structured in terms of operations on Htrees • aims • be a simple, complete and effective parallelization tool • hide to the user the details of the parallel programming • preserve most of the sequential code PDCN 2005
PIT API • main operations • PITree creation • PITree completion • PITree update • alternative API • standard • advanced • composition of the adopted API • standard structure • customised for the specific problem PDCN 2005
PITree Creation • it creates the PITree starting from the domain elements • one (or more) pHt for each p-node • one mHt replicated in each p-node • it implements a distributed strategy to exploit memory at best • it needs some user-defined functions to manage the elements of the target problem PDCN 2005
PITree Completion (I) • standard API: • fault prevention and informed fault prevention • one function only implements the strategy • invoked before each operator PITree_completion(pht_root, stencil_0) tp_op_0(pht_root) this comes from the sequential code PDCN 2005
PITree Completion (II) • advanced API: • informed fault prevention only • two distinct functions • PITree_det_neighbours: invoked each time the neighbourhood relations among the elements changes • PITree_exch_neighbours: invoked before each operator PITree_det_neighbors(pht_root, stencil_0) PITree_exch_neighbors(pht_root, stencil_0) tp_op_0(pht_root) this comes from the sequential code PDCN 2005
PITtree Update (I) • advanced API: two distinct functions • PITree correction: • updates the mapping of the elements violating the mapping strategy • it is invoked after each operator that updates the distribution tp_op_0(pht_root) PITree_correction(pht_root) • PITree balance: • updates the mapping to redistribute the workload among the p-nodes • it is invoked after each operator that modifies the workload tp_op_0(pht_root) PITree_balance(pht_root, Tresh) PDCN 2005
PITtree Update (II) • Standard API: • one function only, PITree update, implements the PITree correction and balancing • PITree update is invoked after each operator tp_op_0(pht_root) PITree_update(pht_root, Tresh) PDCN 2005
Parallelization • Standard: • the functions of the sequential version are inserted into the standard structure • the development is straighforward • a deep knowledge of the target problem is not required • Customized • the PIT operations are inserted into the sequential code according to the semantics of the target problem • a deep knowledge of the target problem is required • both the standard and the advanced API can be adopted • it achieves a better efficiency PDCN 2005
Sequential Code irregular_problem(tElementList *dom) { ... root = Htree_creation(dom) ... while (not solution_computed) { tp_op_0(root) … tp_op_n(root) } } problem operator: mainly consists in a visit of the Htree PDCN 2005
Standard Structure irregular_problem(tElementList *dom) { ... pht_root = PITree_creation(dom, dec_el, incl_el, rem_el) ... while (not solution_computed) { PITree_completion(pht_root, stencil_0) tp_op_0(pht_root) pht_root = PITree_update(pht_root, T) …. PITree_completion(pht_root, stencil_n) tp_op_n(pht_root) pht_root = PITree_update(pht_root, T) } } PDCN 2005
Customised Structure irregular_problem(tElementList *dom) { … pht_root = PITree_creation(dom, dec_el, incl_el, rem_el) ... while (not solution computed) { PITree_det_neighbors(pht_root, stencil_0+..+stencil_i) PITree_exch_neighbors(pht_root, stencil_0) tp_op_0(pht_root) … PITree_exch_neighbors(pht_root, stencil_i) tp_op_i(pht_root) PITree_correction(pht_root) PITree_det_neighbors(pht_root, stencil_i+1+..+stencil_n) … PITree_exch_neighbors(pht_root, stencil_n) tp_op_n(pht_root) PITree_update(pht_root) } } PDCN 2005
Validation • Applications • Adaptive Multigrid Methods • Hierarchical Radiosity • Parallel architectures • PC cluster • Intel Pentium II 266MHz • 128 Mb • 100Mb Fast Ethernet • IBM Beowulf (x330) • Intel Pentium III 1.133GHz • 1GB per p-node (2 procs) • Myricom LAN (264MB) PDCN 2005
Adaptive Multigrid Methods • fast iterative methods to solve partial diff. equations • discretized and multi level domain representation through a grid hierarchy • adaptive problem: • the discretization is finer where the equation is irregular • new grids are added during the computation • Poisson Problem PDCN 2005
Sequential Code amm(tElementList *initial_grid) { root=Htree_creation(initial_grid) while (not end) { smoothing(root, v, f, all_levels) for level from Lmax downto Lg { rest(root, level) restriction(root, level-1) smoothing(root, e, r, level-1) } for level frm Lg+1 to Lmax { prolongation(root, level) correction(root, e, level) smoothing(root, e, r, level) } correction(root, v, all_levels) end = norm(root) if (not end) Lmax = refinement(root) } PDCN 2005
Parallel Code (I) amm(tElementList *initial_grid) { pht_root = PITree_creation(initial_grid, dec_el, incl_el, rem_el) while (not end) { PITree_det_neighbors(pht_root, stencil_union) PITree_exch_neighbors(pht_root, smooth-rest_stencil, all_levels) smoothing(pht_root, v, f, all_levels) for level from Lmax downto Lg { PITree_exch_neighbors(pht_root, smooth-rest_stencil, level) rest(pht_root, level) PITree_exch_neighbors(pht_root, restriction_stencil, level) restriction(pht_root, level-1) PITree_exch_neighbors(pht_root, smooth-rest_stencil, level) smoothing(pht_root, e, r, level-1) } PDCN 2005
Parallel code (II) for level frm Lg+1 to Lmax { PITree_exch_neighbors(pht_root, prolongation_stencil, level) prolongation(pht_root, level) correction(pht_root, e, level) PITree_exch_neighbors(pht_root, smooth-rest_stencil, level) smoothing(pht_root, e, r, level) } correction(pht_root, v, all_levels) PITree_exch_neighbors(pht_root, norm_stencil, level) end = norm(pht_root) if (not end) Lmax = refinement(pht_root) pht_root = PITree_update(pht_root, T) } } PDCN 2005
Domain Hierarchical Decomposition After 10 Iterations PDCN 2005
Load Balancing PDCN 2005
Efficiency PDCN 2005
Hierarchical Radiosity • a model of the light exchanges to compute the illumination of a scene • representation of the scene • discretized and hierarchical • adaptive • locality: interactions among objects at distinct abstraction levels PDCN 2005
Sequential Code hierarchical_rad(segment_list *scene) { root = Htree_creation(scene) visib_list_det(root) while (not end) { Gather_H(root) for level from L_min to L_max Push_H(root, level) for level from L_max downto L_min Pull_H(root, level) end = RefineLink_H(root) } } PDCN 2005
Parallel Code (I) hierarchical_rad(segment_list *scene) { pht_root = PITree_creation(scene, dec_el, incl_el, rem_el) PITree_exch_neighbors(pht_root, vis_stencil, all_levels) visib_list_det(pht_root) while (not end) { PITree_exch_neighbors(pht_root, int_list, all_levels) Gather_H(pht_root) for level from L_min to L_max { PITree_exch_neighbors(pht_root, push_stencil, level) Push_H(pht_root, level) } PDCN 2005
Parallel Code (II) for level from L_max downto L_min { PITree_exch_neighbors(pht_root, pull_stencil, level) Pull_H(pht_root, level) } end = RefineLink_H(pht_root) pht_root = PITree_balance(pht_root) } } PDCN 2005
Test Scene • 192 polygons • 896 segments PDCN 2005
Efficiency PDCN 2005
Future Works • the definition of the set of problems that cannot be solved adopting our methodology • the definition of programming constructs for the considered class of problems PDCN 2005