Role of Group Communication in BS Architecture (or: Which platform are we going to use ?)

300 Kms Bologna Role of Group Communication in BS Architecture(or: Which platform are we going to use ?) Alberto BartoliUniversity of trieste

BS will certainly use some form of replication • BS will certainly use some form of GC Group Communication • Group Communication (GC) • Suite of communication and membership primitives • Very useful for implementing replication algorithms • In particular, in the presence of failures(host, network)

Options Reliable Broadcast • JavaGroups • Used in JBoss clustering extensions • Implemented in Java (stack of layers) • Spread • Used in a variety of environments • Implemented in C (Java interface available) • JBora • Used in my lab only… • Thin Java layer on top of Spread Uniform Broadcast(much more powerful) Novel idea (?)“Primary Uniform” Broadcast(much simpler to use)

m m m • “Frequent” (informal) requirement:If a process executes A and then crashes,Then A must be executed also by all processes that do not crash(“actions must not be lost”) Replication • Action A executed upon receiving a multicast • Execute a method on a local object • Update the serialized state of a local bean • Commit a transaction • ...

Failure Membershipchange Executing A might not be safe ! Reliable broadcast ( JavaGroups) Either all correct processes deliver a message or none of them does No guarantee on processes that are not “correct” !!!

If a process delivers a message,then all correct processes deliver that message NO NO NO Executing A is always safe ! Uniform broadcast ( Spread)

In practice • “Cross-the-fingers” reliability(don’t know or don’t care) • “Real” reliability • Uniform broadcast • Reliable broadcast + Additional measures • Replicated Databases: if a replica commits a transaction not committed by other replicas, undo the transaction later (Bettina) • Replicated Services: whenever a replica crashes, surviving replicas fetch from all clients the last reply they have received (Karamanolis, Magee — IEEE TSE) • Replicated Data: wait for an explicit responsefrom every available replica(myself, Ozalp — JPDC) (JBoss SFSB clustering)

“Real reliability” • Reliable broadcast + Additional measures: ( JavaGroups) • Each additional measure is ad-hoc • Each additional measure is complex(many failure patterns to consider and to cope with) • Lot of work above the GC layer • Uniform broadcast: ( Spread) • No additional measures • Systematic approach, complexity within the GC layer • Very little work above the GC layer

“The view is about to split;GC can’t tell whether the processes that are about to leavehave received the messages that follow” In-doubt message:can’t tell who will receive this! All correct processeswill receive this Again the same problem ! Uniform broadcast:Hhmmm…. If the network can partition,you need additional measures again !!!

Very simple reasoning for the “common case” Processes that leave the primary view deliver a prefixof the sequence of messages in the primary view Non-Primary View 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10 11 Primary View Executing A is always safe ! No need for additional measures ! 9 10 11 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10 11 JBora

So what ? • JavaGroups • Spread • JBora

Scenario 1 We want to rely (almost) completely on(some snapshot of) JBoss clustering • My suggestion: • Forget about uniform multicast (Spread, JBora) • Stick with JavaGroups • Preliminary WP1 Meeting (Bologna, Trieste) • Encapsulating Spread within JavaGroups • Encapsulating JBora within JavaGroups • …too complex, dubious advantages • (see meeting slides for details)

Scenario 2 We don’t want to run behind JBoss clustering(write our own clustering features) • My suggestion: • We are not interested in uniform multicast  Use JavaGroups • We are interested in uniform multicast  Use JBora

My opinion • If we use JavaGroups  • I will ask to restructure WP1(Task 1.3 “Support for group communication”) • Month 18: Dear reviewer, Trieste has led Task 1.3. We did almost nothing. • If we decide to write our own clustering features  • I don’t see any single reason why we should eliminateuniform multicast from the beginning

Experiments:“Throughput under stress” • Each sender injects 1000 msg/sec (bursty) • All details available in a separate document(4 PIII 800 MHz, Windows 2000, Ethernet 100 MB) • Important findings about JavaGroups(configured as in JBoss clustering): • Processes may start missing messages and this occurs silently(no failure notification whatsoever) • You cannot start / recover multiple processes simultaneously(they do not discover each other) • Does not seem very “reliable” (at least, when stressed)

FIFOReliable TotalUniform  150 ! Failed ! Verypreliminary... A few numbers... Spread JBora JavaGroups 1 sender(500 Byte) 640 576 561 2 senders 1254 871 496 1 sender(5 KBytes)323 323 165 2 senders 592 359 275 • Recall: • Spread, JBora: Message throughput  Operation throughout • JavaGroups: Message throughput < Operation throughput(N responses for each multicast)

Appendix

deliver m m ack Uniform broadcast:How is it implemented ? • Messages within the GC Layersfor one uniform broadcast • Uniform broadcast delivered only upon the second broadcast • In practice, many, many optimizations: • The white messages are not separated messages,but fields of other messages required anyway • Costly, but not as much as it seems

JBoss Clustering (I) m • Messages from the applicationlayer for one operation done • My belief: • Less efficient than uniform multicast • The application injects N one-to-one messages into the system

JBoss Clustering (II) • Devising all possible failure patterns and coping with them correctly is very, very complex • Difficult to achieve full confidence in the algorithm and its implementation m • I know from our JPDC work that coping with view changes here is VERY complex • Does JBoss really handle all cases correctly ? done

Transitional Views:Why cannot be avoided ? • Suppose a network failure during the protocol • GCLayer may end up with one side of the partitionthat does not know whether the other side hasreceived the message • Two approaches: • GCLayer waits until the partition recovers(not feasible) • GCLayer notifies the application of the new view after a warning

Role of Group Communication in BS Architecture (or: Which platform are we going to use ?)