390 likes | 411 Views
Scalable Group Communication In Heterogeneous Cluster. Filip Hanik Apache Software Foundation June 30 th , 2006. Who am I. fhanik@apache.org Tomcat Committer / ASF member Responsible for session replication and clustering Been involved with ASF since 2001. What we will cover.
E N D
Scalable Group CommunicationIn Heterogeneous Cluster Filip Hanik Apache Software FoundationJune 30th, 2006
Who am I • fhanik@apache.org • Tomcat Committer / ASF member • Responsible for session replication and clustering • Been involved with ASF since 2001
What we will cover • Introduction to group communication • Challenges in group/cluster communication • Today’s Solutions • Detailed Tribes overview • Tribes – design/configuration/usage • Problems and their solutions • Q & A
What is Group Communication • 1-to-n communication between software/hardware nodes • Designed to reduce packets compared to 1-to-1 (point to point) communication • Also referred to as broadcasting and/or multicasting • broadcast != multicast • broadcast – all nodes receive • multicast – interested (subscribed) nodes receive • Popular academic research topic!! Lots of information available
Challenges in Group Communication • Multicast is most commonly used • Group consistency and leadership • Delivery guarantee • Group delivery guarantee • Ordering and total ordering • Flow control • Multiple networks
Today’s Solutions • Dozens if not hundreds academic products • Not maintained, Not supported, Proprietary • Many open source projects • Appia, Spread, Erlang, JGroups…list goes on • Most multicast based to solve the 1-to-n packet reduction problem
What is uniform group model? • Nodes are identical • All nodes process, send and receive message in the same way • All nodes have the same applications • Total ordering is based on the complete group • Note: Not the official definition for what uniformity in a group setting is
When isn’t the uniformity enough? • When processes on each node are dynamic - activate, passivate, short and long lived • Example, Tomcat webapps • Example, heterogeneous hardware environments • Application management vs. application data replication • Messages with different priorities • Example, session attribute being replicated vs. a 25MB war file being transferred • Need different guarantee levels • When most messages are 1-to-m m<n
Challenges in heterogeneous clusters • Same challenges as in homogeneous environments • Node attributes change runtime • Nodes carry different responsibilities • Total order messages that are sent 1-to-m where m < n
What is Tribes? • Tribes is a messaging framework with group communication capabilities • 100% Java, Apache Licensed (2.0) • Born out of the cluster/session replication code from Tomcat 5.0-5.5 early 2006 • Currently alpha, will become the communication framework for Tomcat’s next cluster implementation • Ideas from 2001
Why Tribes? • Many frameworks are not flexible enough • Not enough features • Messages were guaranteed, without delivery feedback • Static configurations for message delivery • Based on 1-to-m delivery, where m<n • License, license, license…
Why Tribes? • Research gap - platforms are proprietary and often suggest protocols that are not standard • Opportunities for httpd & Tomcat and other ASF software integration for more advanced and intelligent clusters • Separation of communication layer • Did I say Apache License?
Why not Tribes • TCP is connection based • When you always want to send 1-to-n • Unique scenario where a highly customized solution might be the best fit • Its not the one fit all solution, if such exists
Goals • Simplify peer-to-peer and peer-to-group communication for distributed applications • Flexible enough to support a wide range of applications under one runtime configuration • Provide instant feedback on message delivery • Concurrent message delivery, even between two nodes • Parallel delivery to multiple nodes • Clean, intuitive and easy to use, even for complex tasks • All this with low overhead
Feature Overview • Pluggable Modules • Guaranteed Messaging • Different Guarantee Levels • Per message delivery semantics(!) • Pluggable Interceptors (runtime) • Delivery feedback – even for async • Concurrent and parallel delivery • Fixed node hierarchy
Feature: Pluggable Modules • All major components can be swapped out, simple interfaces defined • Needed when customization is required for lower level IO operations • Example • Multicast not available • Proprietary network protocols • SSL • Goal: Default Implementation to be enough for 80% of applications that require messaging
Feature: Guaranteed Msg Delivery • Assume 1-to-m delivery, (m < n) • Default implementation is TCP based • java.io & java.nio • Most cases, TCP(java) will outperform UDP when flow control and ack/nack for guaranteed delivery is implemented • java.io support for platforms with poor NIO implementations • java.nio preferred
Feature: Guarantee Levels • By default supports 3 levels • NO_ACK – message was sent • Relies on TCP to deliver without node feedback • ACK – message was received • Remote node replies with an ACK • SYNC_ACK – message was processed • Remote node replies with ACK/FAIL_ACK when message has been processed • Allows for message process feedback
Feature: Per message delivery semantics • Most unique feature, what makes Tribes really stand out • Allows for each message to be delivered differently • Per message guarantee level • Sync vs. async • Not ordered, ordered, totally ordered • 27 flags - 2ⁿ (n=27) combinations • Based on interceptors configured • Each message with its own uniquedelivery guarantee
Feature: Pluggable Interceptors • React on message attributes (flags) • If not modifying message bytes, can be inserted run time • Intercept any events through defined methods • ChannelInterceptorBase available to minimize redundant code for non intercepted methods
Feature: Delivery Feedback • Tribes aims to deliver feedback for each message and each delivery semantic • NO_ACK, ACK, SYNC_ACK • Synchronous and asynchronous delivery • Asynchronous gets feedback through callback • Example, recoverable transactions can now be implemented since we always know if the remote node received the message
Feature: Concurrent & Parallel Delivery • Concurrent • More than one message sent or received a any point in time • No “message blocking” ie 10mb message with SYNC_ACK will not stop 10kb NO_ACK • Parallel • Able to send a message to multiple destinations in parallel using one thread (NIO) • Prioritized • Future feature
Feature:Fixed Node Hierarchy • Absolute Order Algorithm • Always be able to determine leadership • No message exchanges (chat free) • Non coordinated • Also provides “Coordination” algorithm • Chatty, but efficient • Auto merge groups • Enhance node discovery where multicast might glitch • Can connect different subnets when used together with the StaticMembershipInterceptor
Feature:Absolute Failure Detection • Simple interceptor TcpFailureDetector • Instant feedback on member down • No need to wait for timeout • No risk of node pings getting stuck on a busy network • Verifies timeouts against “false positives” • 3 levels • Connect • Send • Read
Feature RPC messaging • Ability to collect responses to a message • NO_REPLY, FIRST_REPLY, MAJORITY_REPLY & ALL_REPLY • Absence reply(!) – rather than timeout • Callback left over delivery • Support for multiple RPC channels on top of one Tribes channel
Feature – JNDI Channel • Ability to bind a channel into a JNDI tree • Share the channel between objects • Ideal for J2EE messaging • Coming soon: • Ability to download client stub • Out of process invocation • Not yet implemented…
Architecture - Overview Application Application Application Application Tipi Tipi RpcChannel RpcChannel TX RX Channel Interceptor Interceptor Coordinator Membership Sender Receiver
Architecture - Channel • 1 instance per Tribes runtime setup • Is the first interceptor • Holds a list of one or more ChannelListeners & MembershipListeners • Serializes and deserializes messages • Supports ByteMessage for transfer of pure byte[] data • RpcChannel instanceof ChannelListener
Architecture - Interceptors • Linked list invocation • Strongly typed – one method per event • No events need to travel through the stack to coordinate interceptors • Examples • Failure detection • Static membership • Total order or per member order • Throughput measurements and statistics • Leadership election • Message data encryption • Message dispatch – asynchronous messaging • All or none delivery guarantee
Architecture - Interceptors • Trigger on ChannelData.getOptions() • Pass through a ChannelData object • Using XByteBuffer – optimized byte[] handling • Membership & Message interceptions • Threadless
Architecture - Coordinator • Last interceptor • Coordinates IO components • Sender • Receiver • Membership • Receiver uses thread pool • Sender piggy backs on application thread
Code Structure • org.apache.catalina.tribes • Application and Component interfaces • group – default implementation • transport – RX/TX components • membership – membership service • group.interceptors – supplied interceptors • io – protocol utilities and optimizations • tipis – utilities on top of Tribes core
Quick Start Channel myChannel = new GroupChannel(); ChannelListener msgListener = new MyMessageListener(); MembershipListener mbrListener = new MyMemberListener(); myChannel.addMembershipListener(mbrListener); myChannel.addChannelListener(msgListener); myChannel.start(Channel.DEFAULT); //start the channel Serializable myMsg = new MyMessage(); Member[] group = myChannel.getMembers(); channel.send(group,myMsg,Channel.SEND_OPTIONS_DEFAULT);
Data Replication • ReplicatedMap – one to all replication • LazyReplicatedMap – primary/backup replication • Cookie based replication map • ideal for HTTP session replication • Backup location stored in cookies • Versioned delta replication • Example: org.apache.catalina.ha
Tribes Demos • Demo • Code Example • Discussion around common problems and how Tribes could solve them
Future Work • Security - SSL Support and node authentication • Many processes – one channel • Language independent • WAN membership discover • TCP Based multicaster for large clusters • 2*n packet reduction for the sender, not total • Intelligent membership broadcasting • httpd as a load balancer
Q & A • fhanik@apache.org • http://people.apache.org/~fhanik/tribes • Tomcat SVN repository • Interested to use? • Interested to help?
Folientitel • Font: Trebuchet MS, 32 pt • Font: Trebuchet MS, 28 pt • Font: Trebuchet MS, 24 pt • Font: Trebuchet MS, 20 pt • Lorem ipsum dolor sit amet, consectetur adipscing elit, sed diam nonnumy eiusmod tempor incidunt ut labore et dolore magna aliquam erat volupat.
Folientitel Lorem ipsum dolor sit amet, consectetur adipscing elit, sed diam nonnumy eiusmod tempor incidunt ut labore et dolore magna aliquam erat volupat. Et harumd dereud facilis est er expedit distinct. Nam liber a tempor cum soluta nobis eligend optio comque nihil quod a impedit anim id quod maxim placeat. Lorem ipsum dolor sit amet, consectetur adipscing elit, sed diam nonnumy eiusmod tempor incidunt ut labore et dolore magna aliquam erat volupat. Et harumd dereud facilis est er expedit distinct.