1 / 20

Gabriel Antoniu, Lo ï c Cudennec, Majd Ghareeb INRIA/IRISA Rennes, France

Building Hierarchical Grid Storage Using the GFarm Global File System and the JuxMem Grid Data-Sharing Service. Gabriel Antoniu, Lo ï c Cudennec, Majd Ghareeb INRIA/IRISA Rennes, France. Osamu Tatebe University of Tsukuba Japan. Context: Grid Computing.

thea
Download Presentation

Gabriel Antoniu, Lo ï c Cudennec, Majd Ghareeb INRIA/IRISA Rennes, France

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Hierarchical Grid Storage Using the GFarm Global File System and the JuxMem Grid Data-Sharing Service Gabriel Antoniu, Loïc Cudennec, Majd Ghareeb INRIA/IRISA Rennes, France Osamu Tatebe University of Tsukuba Japan

  2. Context: Grid Computing • Target architecture: cluster federations (e.g. Grid’5000) • Focus: large-scale data sharing Solid mechanics Optics Dynamics Satellite design Thermodynamics

  3. Approaches for Data Management on Grids • Use of data catalogs: Globus • GridFTP, Replica Location Service, etc • Logistical networking of data: IBP • Buffers available across Internet • Unified access to data: SRB • From file-systems to tapes and databases • Limitations • No transparency => Increased complexity at large scale • No consistency guarantees for replicated data

  4. Towards Transparent Access to Data • Desirable features • Uniform access to distributed data via global identifiers • Transparent data localization and transfer • Consistency models and protocols for replicated data • Examples of systems taking this approach • On clusters • Memory level: DSM systems (Ivy, TreadMarks, etc.) • File level: NFS-like systems • On grids • Memory level: data sharing services • JuxMem - INRIA Rennes, France • File level: global file systems • Gfarm - AIST/University of Tsukuba, Japan

  5. Idea: a Collaborative Research of Memory and File-level Data Sharing • Study possible interactions between • The JuxMem grid data sharing service • The Gfarm global file system • Goal • Enhance global data sharing functionality • Improve performance and reliability • Build a memory hierarchy for global data sharing by combining the memory level and the file system level • Approach • Enhance JuxMem with Persistent Storage using Gfarm • Support • The DISCUSS Sakura bilateral collaboration (2006-2007) • NEGST (2006-2008)

  6. JuxMem: a Grid Data-Sharing Service • Generic grid data-sharing service • Grid-scale: 103-104 nodes • Transparent data localization • Data consistency • Fault-tolerance • JuxMem ~= DSM + P2P • Implementation • Multiple replication strategies • Configurable consistency protocols • Based on JXTA 2.0 (http://www.jxta.org/) • Integrated into 2 grid programming models • GridRPC (DIET, ENS Lyon) • Component models (CCM & CCA) Juxmem group Data group D Cluster group B Cluster group A Cluster group C http://juxmem.gforge.inria.fr

  7. Self-organizing group Adaptation layer Group membership GDG Atomic multicast Data group D LDG LDG Consensus Client LDG Failure detectors JuxMem’s Data Group: a Fault-Tolerant, Self-Organizing Group • Data availability despite failures is ensured through replication and fault-tolerant building blocks • Hierarchical self-organizing groups • Cluster level: Local Data Group (LDG) • Grid level: Global Data Group (GDG) GDG : Global Data Group LDG : Local Data Group

  8. JuxMem: Memory Model and API • Memory model (currently): entry consistency • Explicit association of data to locks • Multiple Reader Single Writer (MRSW) • juxmem_acquire, acquire_read, release • Explicit lock acquire/release before/after access • API • Allocate memory for JuxMem data • ptr = juxmem_malloc (size, #clusters, #replicas per cluster, &ID…) • Map existing JuxMem data to local memory • ptr = juxmem_mmap (ID), juxmem_unmap (ptr) • Synchronization before/after data access • juxmem_acquire(ptr), juxmem_acquire_read(ptr), juxmem_release(ptr) • Read and write data: direct access through pointers! • int n = *ptr; • *ptr =…

  9. /gfarm ggf jp file1 file2 aist gtrc file2 file1 file3 file4 Gfarm: a Global File System [CCGrid 2002] • Commodity-based distributed file system that federates storage of each site • It can be mounted from all cluster nodes and clients • It provides scalable I/O performance wrt the number of parallel processes and users • It avoids access concentration by automatic replica selection Global namespace mapping File replica creation Gfarm File System

  10. Gfarm: a Global File System (2) • Files can be shared among all nodes and clients • Physically, it may be replicated and stored on any file system node • Applications can access it regardless of its location • File system nodes can be distributed Client PC /gfarm Gfarm file system metadata File A File A Note PC File B File C File C File A File B File B … US File C Japan

  11. Our Goal: Build a Memory Hierarchy for Global Data Sharing • Approach • Applications use JuxMem’s API (memory-level sharing) • Applications DO NOT use Gfarm directly • JuxMem uses Gfarm to enhance data persistence • Without Gfarm, JuxMem supports some crashes of memory providers thanks to the self-organizing groups • With Gfarm, persistence is further enhanced thanks to secondary storage • How does it work? • Basic principle: on each lock release, data can be flushed to Gfarm • Flush frequency can be tuned to compromise efficiency/fault tolerance

  12. Step 1: A Single Flush by One Provider • One particular JuxMem provider (GDG leader) flushes data to Gfarm • Then, other Gfarm copies can be created using Gfarm’s gfrep command JuxMem Global Data Group (GDG) JuxMem Provider GDG Leader JuxMem Provider JuxMem Provider GFSD GFSD GFSD GFSD GFarm Cluster #1 Cluster #2

  13. Step 2: Parallel Flush by LDG Leaders • One particular JuxMem provider in each cluster (LDG leader) flushes data to Gfarm (parallel copy creation, one copy per cluster) • The copies are registered as the same Gfarm file • Then, extra Gfarm copies can be created using Gfarm’s gfrep command JuxMem Provider LDG #1 Leader LDG #2 Leader JuxMem Provider JuxMem Local Data Group (LDG #1) JuxMem Local Data Group (LDG #2) GFSD GFSD GFSD GFSD GFarm Cluster #1 Cluster #2

  14. Step 3: Parallel Flush by All Providers • All JuxMem providers in each cluster (LDG leader) flush data to Gfarm • All copies are registered as the same Gfarm file • Useful to create multiple copies of the Gfarm file per cluster • No more replication using gfrep JuxMem Global Data Group (GDG) JuxMem Provider JuxMem Provider JuxMem Provider JuxMem Provider GFSD GFSD GFSD GFSD GFarm Cluster #1 Cluster #2

  15. Deployment issues • Application deployment on large scale infrastructures • Reserve resources • Configure the nodes • Manage dependencies between processes • Start processes • Monitor and clean up the nodes • Mixed-deployment of GFarm and JuxMem • Manage dependencies between processes of both applications • Make the JuxMem provider able to act as a Gfarm client • Approach: use a generic deployment tool: ADAGE (INRIA, Rennes, France) • Design specific plugins for Gfarm and JuxMem

  16. ADAGE: Automatic Deployment of Applications in a Grid Environment • IRISA/INRIA Paris Research Group • Deploy a same applicationon different kinds of resources • from clusters to grids • Support multi-middleware applications • MPI+CORBA+JXTA+GFARM... • Network topology description • Latency and bandwidth hierarchy • NAT, non-IP networks • Firewalls, Asymmetric links • Planner as plugin • Round robin & Random • Preliminary support for dynamic applications • Some successes • 29,000 JXTA peers on ~400 nodes • 4003 components on 974 processors on 7 sites GFarm Application Description JuxMem Application Description Generic Application Description Resource Description Control Parameters Deployment Planning Deployment Plan Execution Application Configuration

  17. Roadmap overview (1) • Design of the common architecture : 2006 • Discussions on possible interactions between JuxMem and Gfarm • May 2006, Singapore (CCGRID 2006) • June 2006, Paris (HPDC 2006 and NEGST workshop) • October 2006: Gabriel Antoniu and Loïc Cudennec visited the Gfarm team • First deployment tests of Gfarm on G5K • Overall Gfarm/JuxMem design • December 2006: Osamu Tatebe visited the JuxMem team • Refinement of the Gfarm/JuxMem design • Implementation of JuxMem on top of Gfarm : 2007 • April 2007: Gabriel Antoniu and Loïc Cudennec visited the Gfarm team • One JuxMem provider (GDG leader) flushes data to Gfarm after each critical section (step 1 done) • Master internship: Majd Ghareeb • December 2007: Osamu Tatebe visited the JuxMem team • Commun paper at Euro-Par 2008

  18. Read performance Worst case: 39 MB/s Gfarm: 69 MB/s Usual case: 100 MB/s

  19. Write performance Worst case: 28.5 MB/s Gfarm: 42 MB/s Usual case: 89 MB/s

  20. Roadmap (2) • Design the Gfarm plugin for ADAGE (April 2007) • Propose a specific application description language for GFarm • Translate the specific description into a generic description • Start processes with respect of the dependencies • Transfer the Gfarm configuration files from: • The Metadata Server to the Agents • The Agents to their GFSD and Clients • Deployment of JuxMem on top of Gfarm (May 2007)- first prototype running on G5K) • ADAGE deploys Gfarm, then JuxMem (separate deployment) • Limitations: the user still needs to indicate the Gfarm client hostname, the Gfarm configuration file location • Design a meta-plugin for ADAGE that automatically deploys a mixed description of a Gfarm+JuxMem configuration (December 2007) • Gfarm v1 and v2 • Work in progress (2008) • Fault-tolerant, distributed meta-data server: Gfarm on top of JuxMem • Master internship: Andre Lage

More Related