Improving System Availability in Distributed Environments

Improving System Availability in Distributed Environments Sam Malekmalek@usc.edu with Marija Mikic-Rakic marija@usc.edu Nels Beckman beckman@usc.edu Nenad Medvidovic neno@usc.edu

Motivation How good is this deployment architecture? What are its properties? How should it be modified to ensure higher availability?

Effect of Deployment on Availability • Redeployment to maximize the availability • Frequency and volume of interactions, reliability and capacity of network links • Hard to determine a good deployment in large scale distributed systems • In the small example above, there are 310 = 59049 possible deployments Redeployment Better deployment  Higher availability Bad deployment  Low availability

Availability Definition The degree to which the system is operational and accessible when required for use

System Model Parameters • Software component properties • Memory requirements • Frequency of interaction • Size of the exchanged data • Hardware host properties • Memory capacity • Network reliability • Network bandwidth • Constraints • Location • Co-location

Problem Definition • Find a system deployment architecture such that: • It adheres to the system model parameters and constraints • It has the greatest availability

Problem Break Down • Lack of knowledge about runtime system parameters • System model parameters not known at the time of initial deployment • System model parameters change at runtime • Reliability of links, frequencies of interaction, etc. • Prism-MW monitoring support • Exponentially complex problem • n components and k hosts = kn possible deployments • DeSi’s polynomial time approximating algorithms • Solution analysis • Comparison of different solutions and algorithms • Centralized vs. Decentralized, performance vs. complexity, etc • DeSi’s visualization and comparison utilities • Effecting the selected solution • Redeploying components • Requires an automated solution • Prism-MW deployment support

Approach Prism-MW DeSi 2) Monitoring Data 4) Redeployment Data 3) Analyze 1) Monitor

Prism-MW • An architectural middleware that enables efficient implementation, deployment, and execution of distributed systems in terms of their architectural elements: components, connectors, configurations, etc. • Support for monitoring • Support for redeployment Simplified Class Diagram of Prism-MW

Prism-MW DeSi 2) Monitoring Data 4) Redeployment Data 3) Analyze 1) Monitor Prism-MW’s Role • Supports: • Step 1 by monitoring events in the system and calculating the system parameters • Step 4 by providing an API for the redeployment of components and meta-level components to automate the tasks

Maximizing Availability • A family of centralized algorithms • Exact – exponential • Stochastic – quadratic • Adaptive greedy – cubic • A family of decentralized algorithms • DecAp: Auction-based – cubic • A set of clustering techniques • Reduce complexity • Improve performance

Algorithms’ Results

Assessing the Algorithms • Efficiency • Execution time vs. precision • Applicability • Centralized vs. Decentralized • Effect of system characteristics • Impact of individual parameter changes • Addition of new system parameters • Application to new system properties • Requires “what if” scenario exploration In comes DeSi!

DeSi’s Architecture • Key properties: • Tailorability • Scalability • Efficiency • Explorability

DeSi’s View (1)

Prism-MW DeSi 2) Monitoring Data 4) Redeployment Data 3) Analyze 1) Monitor DeSi’s Role • Supports: • Step 3 by providing several redeployment algorithms and various visualization utilities • Steps 2 and 4 by providing the appropriate middleware adapter

Conclusion • Suite of automated tools and techniques for improving the availability of a distributed system • Currently extending the tools to model, analyze, and improve other non-functional aspects of a distributed system: security, latency, etc.

Questions?

Improving System Availability in Distributed Environments

Improving System Availability in Distributed Environments

Presentation Transcript

Improving Availability in Multilayer Switched Networks

Protecting Patient Information in Distributed Environments

Availability in Globally Distributed Storage Systems

Improving Outdoor Environments

Availability in Globally Distributed Storage Systems

Availability in Globally Distributed Storage Systems

Security Policy Reconciliation in Distributed Environments

Improving Internet Availability

IMPROVING RECOGNITION PERFORMANCE IN NOISY ENVIRONMENTS

IMPROVING RESPONSIVENESS BY LOCALITY IN DISTRIBUTED VIRTUAL ENVIRONMENTS

Association Rules Mining in Distributed Environments

PRACTICAL DISTRIBUTED COMMIT IN MODERN ENVIRONMENTS

Distributed Virtual Environments

Data Mining in Ubiquitous Distributed Environments

Improving Internet Availability

Biometric Authentication in Distributed Computing Environments

Improving Internet Availability

Improving Robustness in Distributed Systems

Distributed Virtual Environments

Protecting Patient Information in Distributed Environments

Improving Robustness in Distributed Systems

Improving MapReduce Performance in Heterogeneous Environments