490 likes | 684 Views
Reliable Multicasting with JGroups. Bela Ban, Jan 2004 belaban@yahoo.com http://www.jgroups.org. Overview. API, architecture Protocols Building Blocks Performance Future, Conclusion. What Is It ?. Toolkit for reliable multicasting Fragmentation Message retransmission Ordering
E N D
Reliable Multicasting with JGroups Bela Ban, Jan 2004 belaban@yahoo.com http://www.jgroups.org
Overview • API, architecture • Protocols • Building Blocks • Performance • Future, Conclusion EBIG, Oakland Jan 21 2004
What Is It ? • Toolkit for reliable multicasting • Fragmentation • Message retransmission • Ordering • Group membership, membership change notification • LAN or WAN based EBIG, Oakland Jan 21 2004
License • JGroups is a toolkit (JAR), to be linked against an application • Open Source under LGPL • Commercial products can use JGroups without having to LGPL their code • Modifications to JGroups itself need to be LGPL'ed (if distributed) • Dual licensing in the future EBIG, Oakland Jan 21 2004
API • Channel: similar to java.net.MulticastSocket • plus group membership, reliability • Operations: • Create a channel with a set of properties • Connect to a group X. Everyone that connects to X will see each other • Send a message to all members of X • Send a message to a single member EBIG, Oakland Jan 21 2004
API • Receive a message • Retrieve membership • Be notified when members join, leave (including crashes) • Disconnect from the group • Close the channel EBIG, Oakland Jan 21 2004
API JChannel channel=new JChannel("file://home/bela/default.xml"); channel.connect("demo-group"); System.out.println("members are: " + channel.getView().getMembers()); Message msg=new Message(null, null, "Hello world"); channel.send(msg); Message m=(Message)channel.receive(0); System.out.println("received msg from " + m.getSrc() + ": " + m.getObject()); ch.disconnect(); ch.close(); EBIG, Oakland Jan 21 2004
Group topology EBIG, Oakland Jan 21 2004
Demo • Draw • ReplicatedTree: shared state EBIG, Oakland Jan 21 2004
Stats • JGroups has ~ 90KLOC • 30KLOC protocols • 45KLOC main + building blocks • 15KLOC unit tests • ~ 90 protocols shipped with JGroups • Set of well-tested stacks (in XML files) EBIG, Oakland Jan 21 2004
Available protocols I • Transport • UDP, TCP, TCP_NIO, TUNNEL, JMS, LOOPBACK • Discovery • PING, TCPPING, TCPGOSSIP, UDPPING • Group membership • Reliable delivery & FIFO • NAKACK, SMACK, UNICAST EBIG, Oakland Jan 21 2004
Available protocols II • Failure detection • FD, FD_SOCK, FD_PID, FD_SIMPLE, FD_PROB, VERIFY_SUSPECT • Security • ENCRYPT, SSL ConnectionTable (n/a) • Fragmentation (FRAG) • State transfer (STATE_TRANSFER) EBIG, Oakland Jan 21 2004
Available protocols III • Ordering • FIFO, CAUSAL, TOTAL, TOTAL_TOKEN • Virtual Synchrony • FLUSH, QUEUE, VIEW_ENFORCER • Probabilistic Broadcast • PBCAST • Merging: • MERGE(2), MERGEFAST EBIG, Oakland Jan 21 2004
Available protocols IV • Distributed message garbage collection • STABLE • Debugging • PERF, TRACE, PRINTOBJS, SIZE, BSH • Simulation • SHUFFLE, DELAY, DISCARD, DEADLOCK, LOSS, PARTITIONER EBIG, Oakland Jan 21 2004
Available protocols V • Dynamic configuration • AUTOCONF • Flow control • FLOW_CONTROL, FC • Misc • PIGGYBACK, COMPRESS EBIG, Oakland Jan 21 2004
Transport • Task • Send messages from above to all members in the group, or to a single member • Receive messages from NW, pass up stack • UDP: multicast and multiple UDP unicast • TCP: mcast done by multiple TCP unicasts • TUNNEL: send to external router, e.g. through firewall EBIG, Oakland Jan 21 2004
Discovery • Task • Initial discovery of members • Used by GMS to determine coordinator to send JOIN request to • Each member returns its own addr, plus the addr of the coordinator • Typical response ({A,A}, {B,A}, {C,A}) • Wait for n milliseconds or m responses EBIG, Oakland Jan 21 2004
Discovery - UDP • Multicast discovery request • Each member responds with a unicast UDP datagram (local-addr, coord-addr), back to the sender EBIG, Oakland Jan 21 2004
Discovery - TCPGOSSIP • Can be used by both UDP and TCP • External GossipServer • org.jgroups.stack.GossipServer • Maintains table of <group, members> • Each member registers (groupname, own addr) • Lease based - members have to periodically renew registration • Multiple GossipServers possible EBIG, Oakland Jan 21 2004
Discovery - TCPGOSSIP • To obtain initial membership for a given group, TCPGOSSIP contacts the GossipServer • Membership info does not need to be accurate - only goal is to determine coord to send JOIN request to EBIG, Oakland Jan 21 2004
Discovery - TCPPING • Give a set of well known members • For discovery, those members are pinged • If at least 1 responds, we can find the coordinator • Does not require additional process EBIG, Oakland Jan 21 2004
Group Membership • Task • Maintain a list of members • Notify members when a new member joins, or an existing member leaves (or crashes) • Each member has the same ordered list • List can be retrieved by Channel.getView() • First (= oldest) member is coordinator • If coord crashes, 2nd oldest takes over EBIG, Oakland Jan 21 2004
Group Membership - JOIN • New member uses discovery to find coord • If first member -> become coord • Else: sends JOIN to coord • Coord adds new member to list, multicasts new view (member list) to all members • If 2 initial members are started at the same time, MERGE protocol merges them into a single group EBIG, Oakland Jan 21 2004
Group Membership - LEAVE • Member sends LEAVE to coord • Coord multicasts new view to all members EBIG, Oakland Jan 21 2004
Group membership - CRASH • Failure detection protocol sends up SUSPECT event • VERIFY_SUSPECT double checks • GMS multicasts new view (not containing crashed member) • If member resurfaces, it will be shunned • Has to leave and rejoin group EBIG, Oakland Jan 21 2004
Failure detection • Task • Detect if a member has crashed and send SUSPECT event up the stack (to be handled by GMS) • Logical ring over membership • Each member pings its neighbor to the right EBIG, Oakland Jan 21 2004
Failure detection - FD EBIG, Oakland Jan 21 2004
Reliable delivery & FIFO • Lossless and FIFO delivery for multicast and unicast messages • Multicast: NAK and ACK • Unicast: ACK • Missing messages (gaps) are retransmitted • Sender resends or • Receiver requests retransmission EBIG, Oakland Jan 21 2004
Encryption • Uses public/private encryption to join new member and get shared group key • Shared key is used to encrypt all messages • Group key is recomputed on joins/leaves • SSL ConnectionTable • As alternative, to be used in TCP • Uses SSLSocket rather than Socket EBIG, Oakland Jan 21 2004
Properties configuration • Plain string format • "UDP(mcast_addr=228.8.8.8;mcast_port=45566;ip_ttl=32;" + • "mcast_send_buf_size=64000;mcast_recv_buf_size=64000):" + • "PING(timeout=2000;num_initial_members=3):" + • "MERGE2(min_interval=5000;max_interval=10000):" + • "FD_SOCK:" + • "VERIFY_SUSPECT(timeout=1500):" + • "pbcast.NAKACK(max_xmit_size=8096;gc_lag=50;retransmit_timeout=600,1200,2400):" + • "UNICAST(timeout=600,1200,2400,4800):" + • "pbcast.STABLE(desired_avg_gossip=20000):" + • "FRAG(frag_size=8096;down_thread=false;up_thread=false):" + • "pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;" + • "shun=false;print_local_addr=true)" • URL / XML EBIG, Oakland Jan 21 2004
Advantages of protocol stacks • Each property is implemented by 1 prot • Fragmentation, retransmission, ordering • Protocols are assembled into a stack • Stack has exactly the properties needed by the appl / required by the network • Can‘t get this with java.net.Socket, always comes with full TCP/IP EBIG, Oakland Jan 21 2004
Advantages of protocol stacks • Small scope: a protocol does just one job, but does it well • Protocol stacks are fashionable: • Servlet 2.3 filters • Interceptors (Corba, JBoss) • AOP: separation of concerns, e.g. fragmentation should not be an application concern EBIG, Oakland Jan 21 2004
Benefits • Same application code, different protocol stacks (deployment issue) • Application requirements reflected in protocol stack specification • App focuses on domain specific issues EBIG, Oakland Jan 21 2004
Building Blocks • Replicated Cache • NotificationBus • Group RPC EBIG, Oakland Jan 21 2004
Replicated Cache • Shared state across a group • Any change is replicated to all members • New members acquire initial state from coord • Structures supported • Tree • Hashmap • Queues EBIG, Oakland Jan 21 2004
NotificationBus • Thin layer on Channel • Notifications sent to all members • Callback when notification is received • Hook for state sharing EBIG, Oakland Jan 21 2004
Group RPC • Invoke a method call in all members • Get a list of responses • Wait for all responses, majority, first, or none response (use optional timeout) • Handles crashed members correctly (no blocking) EBIG, Oakland Jan 21 2004
Serverless JMS • JMS based on JGroups • Peer-to-peer architecture rather than C/S • Client publishing to a topic • Instead of sending msg to server, and server distributes to multiple clients: publisher multicasts message • JMS Server just another member • Handles persistent messages (DB) EBIG, Oakland Jan 21 2004
Serverless JMS Cost: 4 unicasts Cost: 1 multicast EBIG, Oakland Jan 21 2004
Serverless JMS • Clients are still able to publish even when server is down • Caveat: works in scenario where client and server are in same multicast-reachable NW • Status • Topics/Queues available • No TX/XA, no durable subscriptions, no persistent messages • Download (standalone) beta at jboss.org EBIG, Oakland Jan 21 2004
Where is JGroups used ? • JBoss • Clustering • Replication of entity beans, SLSBs and SFSBs • HA-JNDI • Cache invalidation • Session repl (integrated Tomcat, Jetty) • Serverless JMS • Cache • Replicated transactional clustered cache EBIG, Oakland Jan 21 2004
Where is JGroups used ? • Jonas appserver (clustering) • GroupPac (FT-CORBA impl) • GCT: port to .NET • Replicated Caching • OpenSyphony OSCache • Jakarta Turbine's JCS • Swarmcache EBIG, Oakland Jan 21 2004
Where is JGroups used ? • Session replication • Jetty • Tomcat 4.x • Work in progress on plugin architecture for Tomcat 5.x • Unofficial ones... EBIG, Oakland Jan 21 2004
Performance • 4 nodes, 1 or 2 senders • 750MHz SunBlade 1000 512MB, 100MB switched ethernet • JGroups 2.1 • 8000 10K msgs, in 200 bursts of 20 (2 senders), sleep after burst = 5ms • 451 msgs/s == 4.5MB/s throughput • Resident heap size 35MB max (-Xmx128m) EBIG, Oakland Jan 21 2004
Performance • 1.4 billion messages total • 4 nodes, 2 senders • Message size = 10K • Average msgs/s: 350 • Max resident mem: 35M (-Xmx128m) • Tests available as part of JG distro • Includes gnuplot scripts to generate graphs EBIG, Oakland Jan 21 2004
Current and future projects • JBossCache, Serverless JMS • Port to J2ME (first version available on www.jgroups-me.org) • hsqldb (HyperSonic) database replication • JCache JSR 107 compliant impl (JBoss Cache) • Potential work on GroupComm JSR • jcluster project on dev.java.net EBIG, Oakland Jan 21 2004
Links • www.jgroups.org • "Papers and Articles": link to IBM devworks EBIG, Oakland Jan 21 2004