Hierarchical Synchronization and Consistency in GDS

Hierarchical Synchronization and Consistency in GDS Sébastien Monnet IRISA, Rennes GDS meeting, LIP6, Paris

JuxMem Consistency ProtocolCurrently Home Based • Home node responsible of a piece of data • Actions on the piece of data <=> communication with the home node Home Home Home node Client GDS meeting, LIP6, Paris

Replicated Home • The home node is replicated to tolerate failures • Thanks to active replications all replicas are up-to-date GDS meeting, LIP6, Paris

Replication Consistency • Two layered architecture • Replication based on classical fault tolerant distributed algorithms • Implies a consensus between all nodes • Need for replicates in several clusters (locality) Junction layer Fault tolerance Adapter Group communication and group membership Atomic multicast Consensus Failure detector Communications GDS meeting, LIP6, Paris

GDG : Global Data Group LDG : Local Data Group GDG LDG LDG LDG Hierarchical Client GDS meeting, LIP6, Paris

Synchronization Point of View • Naturally similar to data management • 1 lock per piece of data • Pieces of data are strongly linked to their locks SM Synchronisation manager Client GDS meeting, LIP6, Paris

Synchronization Point of View • The synchronization manager is replicated the same way GDS meeting, LIP6, Paris

Synchronization Point of View Client GDS meeting, LIP6, Paris

In Case of Failure • Failure of a provider (group member) • Held by the proactive group membership : the faulty provider is replaced by a new one • Failure of a client • With a lock => regenerate the token • Without a lock => do nothing • Failure of a whole local group • Very low probability • As if it was a client (as it is for the global group) GDS meeting, LIP6, Paris

False Detection • Blocking unlocking with return code • To be sure that an operation as performed a client has to do something like: do { lock(data) process(data) } while(unlock(data) is not ok) //here we’re sure that the action has been taken into account GDS meeting, LIP6, Paris

Actual JuxMem’s Synchronization (Sum up) • Authorization based • Exclusive (acquire) • Non exclusive (acquireR) • Centralized (active replication) • Strongly coupled with data management • Hierarchical and fault tolerant GDS meeting, LIP6, Paris

Data Updates : When? • Eager (current version) : • When a lock is released update all replicas • High fault tolerant level / Low performances Client GDS meeting, LIP6, Paris

Data Updates : When? • Lazy (possible implementation) : • Update a local data group when a lock is acquired Client GDS meeting, LIP6, Paris

Data Updates : When? • Intermediate (possible implementation) : • Allow a limited number of local update before propagating all the updates to the global level Client GDS meeting, LIP6, Paris

Data Updates : When? • A hierarchical consistency model? • Local lock • Global lock GDS meeting, LIP6, Paris

Distributed Synchronization Algorithms • Naïmi-Trehel’s • Token based • Mutual exclusion • Extented by REGAL • Hierarchical (Marin, Luciana, Pierre) • Fault tolerant (Julien) • Both? • A fault tolerant, grid aware synchronization module used by JuxMem? GDS meeting, LIP6, Paris

Open Question and Future Work • Interface between JuxMem providers and synchronization module • Providers have to be informed of synchronization operations to perform updates • Future work (Julien & Sébastien) • Centralized data / distributed locks? • Data may become distributed in JuxMem (epidemic protocols, migratory replication, etc.) • Algorithms for token-based non-exclusive locks? • May allow more flexibility for replication techniques (passive or quorum based) GDS meeting, LIP6, Paris

Other Open Issues in JuxMem GDS meeting, LIP6, Paris

Junction Layer • Decoupled design • Need to refine the junction layer Consistency Send Receive Junction layer Fault tolerance GDS meeting, LIP6, Paris

Replication Degree • Actual features : the client specifies • The global data group cardinality (i.e number of clusters) • The local data groups cardinality (i.e number of replicas in each cluster) • Desirable features : the client specifies • The criticality degree of the piece of data • The access needs (model, required perfs) • A monitoring module • Integrated to Marin’s failure detectors? • Current MTBF, message losses, etc. • May allow JuxMem to dynamically deduce the replication degree for each piece of data GDS meeting, LIP6, Paris

Application Needs • Access model • Data grain? • Access patterns • Multiple readers? • Locks shared across multiple clusters? • Data criticality • Are there different levels of criticality? • What kind of advice the application can give concerning those 2 aspects? • Duration of the application? • Traces : latency, crashes, message losses? GDS meeting, LIP6, Paris

Hierarchical Synchronization and Consistency in GDS