200 likes | 312 Views
Building Hierarchical Grid Storage Using the GFarm Global File System and the JuxMem Grid Data-Sharing Service. Gabriel Antoniu, Lo ï c Cudennec, Majd Ghareeb INRIA/IRISA Rennes, France. Osamu Tatebe University of Tsukuba Japan. Context: Grid Computing.
E N D
Building Hierarchical Grid Storage Using the GFarm Global File System and the JuxMem Grid Data-Sharing Service Gabriel Antoniu, Loïc Cudennec, Majd Ghareeb INRIA/IRISA Rennes, France Osamu Tatebe University of Tsukuba Japan
Context: Grid Computing • Target architecture: cluster federations (e.g. Grid’5000) • Focus: large-scale data sharing Solid mechanics Optics Dynamics Satellite design Thermodynamics
Approaches for Data Management on Grids • Use of data catalogs: Globus • GridFTP, Replica Location Service, etc • Logistical networking of data: IBP • Buffers available across Internet • Unified access to data: SRB • From file-systems to tapes and databases • Limitations • No transparency => Increased complexity at large scale • No consistency guarantees for replicated data
Towards Transparent Access to Data • Desirable features • Uniform access to distributed data via global identifiers • Transparent data localization and transfer • Consistency models and protocols for replicated data • Examples of systems taking this approach • On clusters • Memory level: DSM systems (Ivy, TreadMarks, etc.) • File level: NFS-like systems • On grids • Memory level: data sharing services • JuxMem - INRIA Rennes, France • File level: global file systems • Gfarm - AIST/University of Tsukuba, Japan
Idea: a Collaborative Research of Memory and File-level Data Sharing • Study possible interactions between • The JuxMem grid data sharing service • The Gfarm global file system • Goal • Enhance global data sharing functionality • Improve performance and reliability • Build a memory hierarchy for global data sharing by combining the memory level and the file system level • Approach • Enhance JuxMem with Persistent Storage using Gfarm • Support • The DISCUSS Sakura bilateral collaboration (2006-2007) • NEGST (2006-2008)
JuxMem: a Grid Data-Sharing Service • Generic grid data-sharing service • Grid-scale: 103-104 nodes • Transparent data localization • Data consistency • Fault-tolerance • JuxMem ~= DSM + P2P • Implementation • Multiple replication strategies • Configurable consistency protocols • Based on JXTA 2.0 (http://www.jxta.org/) • Integrated into 2 grid programming models • GridRPC (DIET, ENS Lyon) • Component models (CCM & CCA) Juxmem group Data group D Cluster group B Cluster group A Cluster group C http://juxmem.gforge.inria.fr
Self-organizing group Adaptation layer Group membership GDG Atomic multicast Data group D LDG LDG Consensus Client LDG Failure detectors JuxMem’s Data Group: a Fault-Tolerant, Self-Organizing Group • Data availability despite failures is ensured through replication and fault-tolerant building blocks • Hierarchical self-organizing groups • Cluster level: Local Data Group (LDG) • Grid level: Global Data Group (GDG) GDG : Global Data Group LDG : Local Data Group
JuxMem: Memory Model and API • Memory model (currently): entry consistency • Explicit association of data to locks • Multiple Reader Single Writer (MRSW) • juxmem_acquire, acquire_read, release • Explicit lock acquire/release before/after access • API • Allocate memory for JuxMem data • ptr = juxmem_malloc (size, #clusters, #replicas per cluster, &ID…) • Map existing JuxMem data to local memory • ptr = juxmem_mmap (ID), juxmem_unmap (ptr) • Synchronization before/after data access • juxmem_acquire(ptr), juxmem_acquire_read(ptr), juxmem_release(ptr) • Read and write data: direct access through pointers! • int n = *ptr; • *ptr =…
/gfarm ggf jp file1 file2 aist gtrc file2 file1 file3 file4 Gfarm: a Global File System [CCGrid 2002] • Commodity-based distributed file system that federates storage of each site • It can be mounted from all cluster nodes and clients • It provides scalable I/O performance wrt the number of parallel processes and users • It avoids access concentration by automatic replica selection Global namespace mapping File replica creation Gfarm File System
Gfarm: a Global File System (2) • Files can be shared among all nodes and clients • Physically, it may be replicated and stored on any file system node • Applications can access it regardless of its location • File system nodes can be distributed Client PC /gfarm Gfarm file system metadata File A File A Note PC File B File C File C File A File B File B … US File C Japan
Our Goal: Build a Memory Hierarchy for Global Data Sharing • Approach • Applications use JuxMem’s API (memory-level sharing) • Applications DO NOT use Gfarm directly • JuxMem uses Gfarm to enhance data persistence • Without Gfarm, JuxMem supports some crashes of memory providers thanks to the self-organizing groups • With Gfarm, persistence is further enhanced thanks to secondary storage • How does it work? • Basic principle: on each lock release, data can be flushed to Gfarm • Flush frequency can be tuned to compromise efficiency/fault tolerance
Step 1: A Single Flush by One Provider • One particular JuxMem provider (GDG leader) flushes data to Gfarm • Then, other Gfarm copies can be created using Gfarm’s gfrep command JuxMem Global Data Group (GDG) JuxMem Provider GDG Leader JuxMem Provider JuxMem Provider GFSD GFSD GFSD GFSD GFarm Cluster #1 Cluster #2
Step 2: Parallel Flush by LDG Leaders • One particular JuxMem provider in each cluster (LDG leader) flushes data to Gfarm (parallel copy creation, one copy per cluster) • The copies are registered as the same Gfarm file • Then, extra Gfarm copies can be created using Gfarm’s gfrep command JuxMem Provider LDG #1 Leader LDG #2 Leader JuxMem Provider JuxMem Local Data Group (LDG #1) JuxMem Local Data Group (LDG #2) GFSD GFSD GFSD GFSD GFarm Cluster #1 Cluster #2
Step 3: Parallel Flush by All Providers • All JuxMem providers in each cluster (LDG leader) flush data to Gfarm • All copies are registered as the same Gfarm file • Useful to create multiple copies of the Gfarm file per cluster • No more replication using gfrep JuxMem Global Data Group (GDG) JuxMem Provider JuxMem Provider JuxMem Provider JuxMem Provider GFSD GFSD GFSD GFSD GFarm Cluster #1 Cluster #2
Deployment issues • Application deployment on large scale infrastructures • Reserve resources • Configure the nodes • Manage dependencies between processes • Start processes • Monitor and clean up the nodes • Mixed-deployment of GFarm and JuxMem • Manage dependencies between processes of both applications • Make the JuxMem provider able to act as a Gfarm client • Approach: use a generic deployment tool: ADAGE (INRIA, Rennes, France) • Design specific plugins for Gfarm and JuxMem
ADAGE: Automatic Deployment of Applications in a Grid Environment • IRISA/INRIA Paris Research Group • Deploy a same applicationon different kinds of resources • from clusters to grids • Support multi-middleware applications • MPI+CORBA+JXTA+GFARM... • Network topology description • Latency and bandwidth hierarchy • NAT, non-IP networks • Firewalls, Asymmetric links • Planner as plugin • Round robin & Random • Preliminary support for dynamic applications • Some successes • 29,000 JXTA peers on ~400 nodes • 4003 components on 974 processors on 7 sites GFarm Application Description JuxMem Application Description Generic Application Description Resource Description Control Parameters Deployment Planning Deployment Plan Execution Application Configuration
Roadmap overview (1) • Design of the common architecture : 2006 • Discussions on possible interactions between JuxMem and Gfarm • May 2006, Singapore (CCGRID 2006) • June 2006, Paris (HPDC 2006 and NEGST workshop) • October 2006: Gabriel Antoniu and Loïc Cudennec visited the Gfarm team • First deployment tests of Gfarm on G5K • Overall Gfarm/JuxMem design • December 2006: Osamu Tatebe visited the JuxMem team • Refinement of the Gfarm/JuxMem design • Implementation of JuxMem on top of Gfarm : 2007 • April 2007: Gabriel Antoniu and Loïc Cudennec visited the Gfarm team • One JuxMem provider (GDG leader) flushes data to Gfarm after each critical section (step 1 done) • Master internship: Majd Ghareeb • December 2007: Osamu Tatebe visited the JuxMem team • Commun paper at Euro-Par 2008
Read performance Worst case: 39 MB/s Gfarm: 69 MB/s Usual case: 100 MB/s
Write performance Worst case: 28.5 MB/s Gfarm: 42 MB/s Usual case: 89 MB/s
Roadmap (2) • Design the Gfarm plugin for ADAGE (April 2007) • Propose a specific application description language for GFarm • Translate the specific description into a generic description • Start processes with respect of the dependencies • Transfer the Gfarm configuration files from: • The Metadata Server to the Agents • The Agents to their GFSD and Clients • Deployment of JuxMem on top of Gfarm (May 2007)- first prototype running on G5K) • ADAGE deploys Gfarm, then JuxMem (separate deployment) • Limitations: the user still needs to indicate the Gfarm client hostname, the Gfarm configuration file location • Design a meta-plugin for ADAGE that automatically deploys a mixed description of a Gfarm+JuxMem configuration (December 2007) • Gfarm v1 and v2 • Work in progress (2008) • Fault-tolerant, distributed meta-data server: Gfarm on top of JuxMem • Master internship: Andre Lage