1 / 15

Dr. Aniruddha S. Gokhale gokhale@dre.vanderbilt (Co-Advisor)

Jaiganesh Balasubramanian jai@dre.vanderbilt.edu http://www.dre.vanderbilt.edu/~jai. FLARe: a F ault-tolerant L ightweight A daptive Re al-time Middleware for Distributed Real-time and Embedded Systems. Dr. Douglas C. Schmidt schmidt@dre.vanderbilt.edu (Advisor).

joy
Download Presentation

Dr. Aniruddha S. Gokhale gokhale@dre.vanderbilt (Co-Advisor)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jaiganesh Balasubramanian jai@dre.vanderbilt.edu http://www.dre.vanderbilt.edu/~jai FLARe: a Fault-tolerant Lightweight Adaptive Real-time Middleware for Distributed Real-time and Embedded Systems Dr. Douglas C. Schmidt schmidt@dre.vanderbilt.edu (Advisor) Dr. Aniruddha S. Gokhale gokhale@dre.vanderbilt.edu (Co-Advisor) Department of Electrical Engineering and Computer Science Vanderbilt University, Nashville, TN, USA Middleware 2007 Doctoral Symposium (MDS 2007) Newport Beach, CA, USA

  2. Focus: Distributed Real-time Embedded (DRE) Systems • Stringent simultaneous QoS demands, e.g., “never die,” soft real-time, etc. • predominantly stateless, tolerates weaker consistency if stateful • Distributed Object Computing middleware used to design and develop DRE systems • support for highly available systems (e.g., FT-CORBA) • end-to-end predictable behavior for requests (e.g., RT-CORBA) • Goal is to provide real-time fault tolerance to DRE systems • FT uses redundancy; RT assured by resource management

  3. Determining the Replication Scheme for DRE Systems • Active replication • client requests multicast and executed at all the replicas • strong state consistency • deterministic behavior of replicas • very fast recovery • resource-expensive • Passive replication • low resource/execution overhead • better suited for weaker consistency • no restrictions on deterministic behavior • enables making tradeoffs between FT and resource consumption • applies to a class of soft real-time DRE systems • Passive replication better suited for our purpose • Goal is to provide RT+FT for DRE systems using passive replication

  4. Challenges: Using Passive Replication for DRE Systems • Challenge 1: Maintain real-time performance of applications at all times • Focus: Real-time performance after failover • Decision-making algorithms used for electing a new primary • Client response times depend on the loads of the processor hosting the failover target • Task deadlines are met if the CPU utilization is under a threshold • Failure could affect multiple clients – failover to multiple processors

  5. Challenges: Using Passive Replication for DRE Systems • Challenge 2: Fast failover on client side • Focus: Faster and predictable failover • Client-side middleware could maintain static list of references • Round-robin approach of trying out different references • Faster failover – but not appropriate failover • No RT guarantee after failover • Client-side middleware need to be updated with references based on dynamic operating conditions

  6. Challenges: Using Passive Replication for DRE Systems • Challenge 3: FT+RT in spite of resource overloads • Focus: Dynamic reconfigurations and overload management • Long running systems – continued operation through many failures • Periodic loss of resources – simultaneous failures • Graceful degradation of applications • Operate higher priority applications at all times • Overload management – predictable and fast • Alternate degraded and assured functionality

  7. Challenges: Using Passive Replication for DRE Systems • Challenge 4: Resource-aware stateful replication • Focus: State consistency in stateful DRE systems • State transfer requires CPU and network reservations • Support for resource-constrained operations • Different consistency models – strong, weak, and no consistency • Adapt consistency of certain tolerant applications depending on available resources • Utility optimizations – better state consistency for higher priority applications

  8. Our Approach: FLARe RT-FT Middleware • FLARe =Fault-tolerant Lightweight Adaptive Real-time Middleware • Transparent and Fast Failover • Redirection using client-side portable interceptors • catches COMM_FAILURE exceptions and transparently throws LOCATION_FORWARD exceptions • Failure detection can be improved with better protocols – e.g., SCTP

  9. Our Approach: FLARe RT-FT Middleware • Real-time performance after failover • monitor CPU utilizations at hosts where backups are deployed • adaptive failover target selection algorithms operated by a resource manager • failover targets chosen on the least loaded host hosting the backups • better chance to provide RT performance

  10. Our Approach: FLARe RT-FT Middleware • Predictable failover • failover target decisions computed periodically by the resource manager • conveyed to client-side middleware agents – forwarding agents • agents work in tandem with portable interceptors • redirect clients quickly and predictably to appropriate targets • agents periodically/proactively updated when targets change

  11. Current Progress • Current Progress • Initial prototype of FLARe developed using The ACE ORB (TAO) • Stateless FT using passive replication • Implemented a resource-aware adaptive failover target selection algorithm • Compared and contrasted the performance of the FT middleware when using static failover strategies versus adaptive failover strategies • Significant reduction in client response times and system utilization FLARe is open-source and available at www.dre.vanderbilt.edu

  12. Proposed Research and Expected Milestones • Overload management • investigate overload management algorithms that do not degrade application QoS • minimum client disturbance • implemented within the resource manager • extreme resource constrained operating conditions – investigate opportunities to change implementations and reduce overloads • Utility optimizations – when to degrade QoS and when not to • RT/FT trade-offs Deadline : March 2008

  13. Proposed Research and Expected Milestones • State Synchronization • View state synchronization as an aperiodic scheduling problem • more slack available – more time available to synchronize state • slack devoted for higher priority applications always • availability of slack – support for different consistency management schemes (e.g., weak, strong, none) • application informs middleware when to synchronize state Deadline : July 2008

  14. Proposed Research and Expected Milestones • Network Reservations • View real-time fault-tolerance as an end-to-end scheduling problem • network reservations are required for state transfers • without reservations, no predictability • middleware-mediated mechanisms to use external network QoS mechanisms such as DiffServ • network monitors for alternate routes in the presence of failures (leverage existing network research) Deadline : September 2008

  15. Concluding Remarks • Passive replication – a promising approach for DRE systems • Resource-aware adaptive fault-tolerance – required for adapting passive replication for DRE system requirements • Adaptive algorithms required for trading off RT versus FT requirements • Middleware transparently supports FT for applications – works in conjunction with adaptive algorithms to take care of RT requirements as well FLARe is open-source and available at www.dre.vanderbilt.edu

More Related