230 likes | 377 Views
FT-ERF Fault-Tolerance in an Event Rule Framework for Distributed Systems. Hillary Caituiro-Monge, Graduate Student. Advisor: Javier Arroyo-Figueroa, Ph.D. Presentation 3. Presentation Objectives. Understand the Architecture of the Scalable and Fault-Tolerant ERF Architecture
E N D
FT-ERFFault-Tolerance in an Event Rule Framework for Distributed Systems Hillary Caituiro-Monge, Graduate Student. Advisor: Javier Arroyo-Figueroa, Ph.D. Presentation 3
Presentation Objectives • Understand the Architecture of the Scalable and Fault-Tolerant ERF Architecture • Relate Challenges on Active Replication • Analyze Core Lacks among RUBIES replicas, with the purpose of Achieve Fault-Tolerance: • Lack of Timing Synchronization of Rule Evaluation Cycles (REC) • Lack of Consistency of Event Sets (ES) • Distributed Agreement Protocol
Presentation Objectives • Introduce Research New Objective
DISTRIBUTION DIMENSION RUBIES(γ11,δ1) RUBIES(γ21,δ2) RUBIES(γN1,δN) RUBIES(γ12,δ1) RUBIES(γ22,δ2) RUBIES(γN2,δN) REPLICATION DIMENSION RUBIES(γ1M,δ1) RUBIES(γ2M,δ2) RUBIES(γNM,δN) SCALABLE AND FAULT TOLERANT ERF ARCHITECTURE
Challenges on Active Replication • Strong replica consistency • All replicas must have the same state after method invocations • Duplicated invocation detection and suppression
Lack of Timing Synchronization of Rule Evaluation Cycles (REC) among RUBIES replicas • It is a source of non-deterministic behavior among RUBIES replicas • It is not triggered in response to direct or indirect client’s method invocation • It is always running • Thereby the replicas consistency is not reachable by means of interface based consistency mechanisms
Lack of Timing Synchronization of Rule Evaluation Cycles (REC) among RUBIES replicas • Each replica from a group has its independent REC, where the • Starting time differs • Duration time differs • Making a scenario where each group member or replica runs each REC including different events.
Lack of Consistency of Event Sets (ES) among RUBIES replicas • It is a source of non-deterministic behavior among RUBIES replicas • The ES’ content changes different for each replica • The ES’ content changes for two reasons: • Incoming events • Died events
Lack of Consistency of Event Sets (ES) among RUBIES replicas • The ES’ content changes different for each replica, it is as consequence of delivery communication delay of events to each replica.
What is the problem? • Each replica, belong to same group, includes dissimilar events for each consecutive equivalent REC execution. • As result each RUBIES replica posts different events in different times and with different state. • Such behavior is a problem for load distribution and/or replication.
What is the issue? • Strong replica consistency • Synchronize rule evaluation cycles among RUBIES replicas • Turn consistent event sets among RUBIES replicas
How to do it? • Distributed Agreement or Consensus Protocol (Currently working in this) • RUBIES replicas must start each REC after an agreement. • RECs must have an unique ID • RECs of same ID must run simultaneously
How to do it? • Distributed Agreement or Consensus Protocol (Currently working in this) • RUBIES replicas must include same events for RECs of same ID • Agreement must include which events will consider • Sliding window
Research New Objective • The proposed research will focus on the fault-tolerance problem in ERF. • The main purpose is to design and implement a strong replica consistency mechanism to achieve fault-tolerance.
Procedure • Select an Active Replication Software • Must be CORBA Fault-Tolerant Compatible • Must be portable • Must not be intrusive • No commercial • Make an Distributed Agreement Protocol • Related Above
OGS (Object Group Service) • Non-intrusive • Service approach. • Requiring no change to the underlying ORB • Compliant with the CORBA specification • Not proprietary. • Designed and implemented as a set of CORBA objects. This makes it interoperable between different ORBs. • Plans to extend OGS and make it compliant with FT-CORBA specification. • White box.
Eternal Systems FTORB • Non-intrusive • Interception approach. • CORBA objects above the ORB support the interfaces of the OMG Fault-Tolerant standard specifications • Replication mechanisms below the ORB that provide strong replica consistency • Interceptors to reach independence of the ORB and applications.
Others • GMS (Group Communication Service) • IRL • Isis+Orbix Electra • AQua
Comparison among Fault-Tolerant CORBA systems Carlo Marchetti et. al. “Architectural Issues on Fault Tolerance in CORBA”, in Proceedings of the SSGRR 2000 Computer & Business Conference, L'Aquila, Italy, 2000
Conclusion • For Fault-Tolerance in ERF is necessary the design and implementation of an agreement protocol with the purpose of achieve strong replica consistency. • Strong replica consistency will enable ERF for distributed scenarios, such as replication, load distribution, load balancing, and so on.