Building Fault-Tolerant Consistency Protocols for an Adaptive Grid Data-Sharing Service

Building Fault-Tolerant Consistency Protocols for an Adaptive Grid Data-Sharing Service Gabriel Antoniu, Jean-François Deverge, Sébastien Monnet PARIS Research Group IRISA/INRIA Rennes, France Workshop on Adaptive Grid Middleware - AGridM 2004 Septembrer 2004, Juan-Les-Pins

Context: Grid Computing • Target architecture: cluster federations (e.g. GRID 5000) • Target applications: distributed numerical simulations (e.g code coupling) • Problem: the right approach for data sharing? Solid mechanics Optics Dynamics Satellite design Thermodynamics

Current Approaches: Explicit Data Management • Explicit data localization and transfer • GridFTP [ANL], MPICH-G2 [ANL] • Security, parallel transfer • Internet Backplane Protocol [UTK] • Limitations • Application complexity at large-scale • No consistency guarantees for replicated data

Migration ? Replication ? Node 0 Node 1 Handling Consistency:Distributed Shared Memory Systems • Features: • Uniform access to data via a global identifier • Transparent data localization and transfer • Consistency models and protocols • But: • Small-scale, static architectures • Challenge on a grid architecture: • Integrate new hypotheses ! • Scalability • Dynamic nature • Fault tolerance

Case Study:Building a Fault-Tolerant Consistency Protocol • Starting point: a home-based protocol for entry consistency • Relaxed consistency model • Explicit association of data to locks • MRSW: Multiple Reader Single Writer • acquire(L) • acquireRead(L) • Implemented by a home-based protocol Home Home node Client

Cluster A Cluster B acquire acquire lock lock read x read x data x data x release release Home Based Protocol Client B Client A Home acquire w(x) lock read x data x w(x) release

Going Large Scale: a Hierarchical Consistency Protocol • Inspired by CLRC[LIP6, Paris] and H2BRC[IRISA, Rennes] Global home Local home Local home Client Local home

Problem: Critical Entities May Crash • How to support home crashes on a grid infrastructure ? Global home Local home Local home Client Local home

Idea: Use Fault-Tolerant Components • Replicate critical entities on a group of nodes • Group of nodes managed using the group membership abstraction • Rely on atomic multicast • Example architecture: A. Schiper[EPFL] Fault tolerance Adapter Group communication and group membership Atomic multicast Consensus Failure detector Communications

Approach: Decoupled Design • Consistency protocol layer and fault-tolerance layer are separated • Interaction defined by a junction layer Consistency Junction layer Fault tolerance

Consistency Protocol (CP) Dynamic CP Configuration Group Self-organization Fault-tolerant Building Blocks Consistency/Fault-Tolerance Interaction • Critical consistency protocol entities implemented as fault-tolerant node groups • Group management using traditional group membership and group communication protocols • Junction layer handles • Group self-organization • Configuration of new group members D D D A B A B A B A B C

Global home Local home Local home Local home Replicate Critical Entities Using Fault-Tolerant Components • Rely on replication techniques and group communication protocols used in fault-tolerant distributed systems Client

GDG : Global Data Group LDG : Local Data Group GDG LDG LDG LDG Replicate Critical Entities Using Fault-Tolerant Components • Rely on replication techniques and group communication protocols used in fault-tolerant distributed systems Client

The JuxMem Framework • DSM systems: consistency and transparent access • P2P systems: scalability and high dynamicity • Based on JXTA, P2P framework [Sun Microsystems] Juxmem group Data group Cluster group A Cluster group C Cluster group B Virtual architecture Physical architecture

Implementation in JuxMem • Data group ≈ GDG + LDG Juxmem group GDG LDG LDG Cluster group A Cluster group C Cluster group B Virtual architecture Physical architecture

Preliminary Evaluation • Experiments • Allocation cost depending on replication degree • Cost of the basic data access operations • read/update • Testbed: paraci cluster (IRISA) • Bi Pentium IV 2,4 Ghz, 1 Go de RAM, Ethernet 100 • Emulation of 6 clusters of 8 nodes Juxmem group Cluster group A Cluster group B Cluster group C Cluster group D Cluster group E Cluster group F

Allocation Process • Discover n providers according to the specified replication degree • Send an allocation request to the n discovered providers • On each provider receiving an allocation request: • Instantiate the protocol layer and the fault-tolerant building blocs

Preliminary Evaluation: Allocation Cost

Cost of Basic Primitives: read/update

Consistency Protocol (CP) Dynamic CP Configuration Proactive Group Membership Fault-tolerant Building Blocks Conclusion • Handling consistency of mutable, replicated data in a volatile environment • Experimental platform for studying the interaction fault-tolerance <-> consistency protocols

Future Work (AGridM 2003) • Consistency protocols in a dynamic environment • Replication strategies for fault tolerance • Co-scheduling computation and data distribution • Integrate high-speed networks: Myrinet, SCI.

Future Work (AGridM 2004) • Goal: build a Grid Data Service • Experiment various implementations of fault-tolerant building blocks (atomic multicast, failure detectors, …) • Parametrizable replication techniques • Experiment various consistency protocols with various replication techniques • Experiment with realistic grid applications at large scales • GDS (Grid Data Service) project of ACI MD: http://www.irisa.fr/GDS

Transparent Data Localisation • DSM Systems • Small scales • No volatility support • Mutable data • P2P Systems • Large scales • Volatility support • Immutable data

Goal • Data-sharing service for the grid • Uniform access to data via a global identifier • Transparent data localization and transfer • Consistency models and protocols • Volatility support • Hybrid system • DSM systems • P2P systems

How to Address Fault-Tolerance and Data Consistency at the Same Time? • Both use replication • Two main approaches • Integrated design • Copies created for locality serve as backup and vice versa • Complex architecture and protocols • Decoupled design • Separate roles, cleaner architecture • Leverage existing consistency protocols and existing fault-tolerance replication techniques • Limited interaction via a well-specified interface

Building Fault-Tolerant Consistency Protocols for an Adaptive Grid Data-Sharing Service

Building Fault-Tolerant Consistency Protocols for an Adaptive Grid Data-Sharing Service

Presentation Transcript

Fault-Tolerant Broadcast

Making Cloud Intermediate Data Fault-Tolerant

Fault Tolerant Parallel Data-Intensive Algorithms

Fault-Tolerant Broadcast

Scalable, Fault-tolerant Management of Grid Services

Fault-tolerant Adaptive Divisible Load Scheduling

Fault-Tolerant CORBA

FAULT TOLERANT CORBA

Fault Tolerant MPI

Fault Tolerant Parallel Data-Intensive Algorithms

Towards Self-Adaptive MAS Fault-Tolerant MAS

JuxMem: An Adaptive Supportive Platform for Data Sharing on the Grid

A Multi-Protocols Fault Tolerant MPI

Building Fault Tolerant Voice User Interfaces

Adaptive Fault Tolerant Systems: Reflective Design and Validation

Fault Tolerant Configuration

An Adaptive Intrusion-Tolerant Architecture

FAULT-TOLERANT NETWORKS AND FAULT-TOLERANT ROUTING

fault-tolerant

Building Fault-Tolerant Enterprise Applications

Fault-tolerant routing

Fault-Tolerant Consensus