Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems • D. Brian Larkins, James Dinan, Sriram Krishnamoorthy, Srinivasan Parthasarthy, Atanas Rountev, P. Sadayappan

Background • Trees and graphs can concisely represent relationships between data • Data sets are becoming increasingly large and can require compute-intensive processing • Developing efficient, memory hierarchy-aware applications is hard

Sample Applications • n-body simulation • Fast Multipole Methods (FMM) • multiresolution analysis • clustering and classification • frequent pattern mining

Key Contributions • Efficient fine-grained data access with a global view of data • Exploit linked structure to provide fast global pointer dereferencing • High-level, locality-aware, parallel operations on linked data structures • Application-driven customization • Empirical validation of the approach

Framework Design

Global Chunk Layer (GCL) • API and run-time library for managing chunks - built on ARMCI • Abstracts common functionality for handling irregular, linked data • Provides a global namespace with access and modification operations • Extensible and highly customizable to maximize functionality and performance

Chunks • A chunk is: • Contiguous memory segment • Globally accessible • Physically local to only one process • Collection of user-defined elements • Unit of data transfer

Programming Model • SPMD with MIMD-style parallelism • Global pointers permit fine-grained access • Chunks allow coarse-grained data movement • Uses get/compute/put model for globally shared data access • Provides both uniform global view and chunked global view of data

Global Pointers p c } } c = &p.child[i] + p.child[i].ci + p.child[i].no 4252 + -4252 + 4340

Global Trees (GT) • Run-time library and API for global view programming trees on DM clusters • Built on GCL chunk communication framework • High-level tree operations which work in parallel and are locality aware • Each process can asynchronously access any portion of the shared tree structure

GT Concepts • Tree Groups • set of global trees • allocations are made from the same chunk pool • Global Node Pointers • Tree Nodes • link structure managed by GT • body is user-defined structure

Example: Copying a Tree

Tree Traversals • GT provides optimized, parallel traversals for common traversal orders • Visitor callbacks are application-defined computations on a single node • GT currently provides top-down, bottom-up, and level-wise traversals

Sample Traversal Usage

Node Mapping

Custom Allocation • No single mapping of data elements to chunks will be optimal • GT/GCL supports custom allocators to improve spatial locality • Allocators can use a hint from call-site and can keep state between calls • Default allocation is local-open

Experimental Results • Evaluate using: • Barnes-Hut from SPLASH-2 • Compression operation from MADNESS • GT compared with: • Intel’s Cluster OpenMP and TreadMarks runtime • UPC

Global Pointer Overhead compress() Barnes-Hut

Chunk Size and Bandwidth Experiments run on the department WCI Cluster - 2.33GHz Intel Xeon, 6GB RAM, Infiniband

Impact of Chunk UtilizationBarnes-Hut Experiments run on the department WCI Cluster - 2.33GHz Intel Xeon, 6GB RAM, Infiniband

Barnes-Hut Chunk Size Selection Barnes-Hut application from SPLASH-2 suite

Barnes-Hut Scaling chunk size = 256, bodies = 512k

Local vs. Remote AccessMADNESS compress()

Related Approaches • Distributed Shared Memory (DSM) • Cluster OpenMP, TreadMarks, Cashmere, Midway • Distributed Shared Objects (DSO) • Charm++, Linda, Orca • Partitioned Global Address Space (PGAS) Languages and Systems • UPC, Titanium, CAF, ARMCI, SHMEM, GASNET • Shared pointer-based data structure support on distributed memory clusters • Parallel Irregular Trees, Olden • HPCS Programming Languages • Chapel, X10

Future Work • Global Graphs • GT data reference locality tools • More applications

Questions email: larkins@cse.ohio-state.edu

Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems