290 likes | 302 Views
Explore how self-tuning systems optimize resource allocation in hosting platforms to meet performance goals in dynamic workloads. Learn about proactive and reactive provisioning techniques to ensure service-level agreements are met. Dive into self-healing systems and fault tolerance principles for reliable operation.
E N D
Self-* SystemsCSE 598B Instructor: Bhuvan Urgaonkar Fall 2005
Introduction • Bhuvan Urgaonkar • Assistant Professor, CSE • Ph.D. Univ. of Mass., Amherst • Research Interests • Distributed systems, operating systems, computer networking, modeling of systems • Office: 338D, Email: bhuvan@cse.psu.edu • Office hours and class timings • Undecided as of now, we will figure this out at the end of the class • If in doubt: just walk in anytime! • Students’ turn to introduce themselves
Self-* systems • Self-*: a regular expression • But not quite • No self-destroying systems • Three themes • Self-tuning systems • Self-healing systems • Self-stabilizing systems • Course Web page: • http://www.cse.psu.edu/~bhuvan/teaching/fall05/self-star.html • To do: Set up a course mailing list
Self-tuning systems Desired Trajectory Friction, Turbulence Guidance Model Thrust Parameters Rocket Thrusters Actual Trajectory • Systems that can adapt their behavior to dynamically changing external influences on their own
Internet applications • Proliferation of Internet applications auction site online game online retail store • Growing significance in personal, business affairs • Focus: Internet server applications
Hosting platforms • Data Centers • Clusters of servers • Storage devices • High-speed interconnect • Hosting platforms: • Rent resources to third-party applications • Performance guarantees in return for revenue • Benefits: • Applications: don’t need to maintain their own infrastructure • Rent server resources, possibly on demand • Platform provider: generates revenue by renting resources
Goals of a hosting platform • Meet service-level agreements • Satisfy application performance guarantees • E.g., average response time, throughput • Maximize revenue • E.g., maximize the number of hosted applications Question:How should a hosting platform manage its resources to meet these goals?
Challenge: dynamic workloads 140000 120000 100000 80000 Request Rate (req/min) 60000 40000 20000 0 0 5 10 15 20 Time (hrs) 1200 • Multi-time-scale variations • Time-of-day, hour-of-day • Overloads • E.g., Flash crowds • User threshold for response time: 8-10 s • Key issue: How to provide good response time under varying workloads? 0 0 1 2 3 4 5 Time (days) Arrivals per min 140K 0 0 12 24 Time (hours)
Self-tuning systems Application Performance Goals Dynamic Workloads Resource Inference Model Resource Shares Resource Schedulers Actual Performance • A self-tuning hosting platform
Dynamic provisioning Monitor workload Compute current/ future demand Adjust allocation • Key idea: increase or decrease allocated servers to handle workload fluctuations • Monitor incoming workload • Compute current or future demand • Match number of allocated servers to demand
Dynamic provisioning at multiple time-scales • Predictive provisioning • Certain Internet workloads patterns can be predicted • E.g., time-of-day effects, increased workload during Thanksgiving • Design a good application model • Provision using model at time-scale of hours or days • Reactive provisioning • Applications may see unpredictable fluctuations • E.g., Increased workload to news-sites after an earthquake • Detect such anomalies and react fast (minutes) • Question: How to put these together? • When to invoke the predictor and the reactor?
Self-healing systems • Systems that continue to operate on their own despite faults or failures • Distinction between faults and failures • Fault: A sysadmin sets a small concurrency limit for a Web server • Failure: debris from an external fuel tank is thought to have struck Columbia's left wing in 2003. • Failure/fault handling capability built into the system • Graceful degradation • We will study classic literature in fault tolerance, papers that apply these principles to modern distributed systems
Self-stabilizing systems • Guaranteed to converge to a desired behavior from any initial state if left alone • Why should one have interest in self-stabilizing algorithms? • Its applicability to distributed systems • Recovering from faults of a space shuttle. Faults may cause malfunction for a while. Using a self-stabilizing algorithm for its control will cause an automatic recovery, and enables the shuttle continue in its task
What is a self-stabilizing algorithm? • This question will be answered using the “Stabilizing Orchestra” example • The Problem: • The conductor is unable to participate – harmony is achieved by players listening to their neighbor players • Windy evening – the wind can turn some pages in the score, and the players may not notice the change
The “Stabilizing Orchestra” Example • Our Goal: To guarantee that harmony is achieved at some point following the last undesired page turn • Imagine that the drummer notices a different page of the violin next to him … (solutions and their problems): • The drummer turns to its neighbors new page – what if the violin player noticed the difference as well? • Both the drummer and violin player start from the beginning- what if the player next to the violin player notices the change only after sync between the other 2?
The Self-Stabilizing Solution • Every player will join the neighboring player who is playing the earliest page (including himself) • Note that the score has a bounded length. What happens if a player goes to the first page of the score before harmony is achieved? • In every long enough period in which the wind does not turn a page, the orchestra resumes playing in synchrony
Discussion: Overlaps and distinctions • Self-tuning vs self-healing vs self-stabilizing systems • Proactive vs reactive
Crosscutting goals and challenges • Removing costly and error-prone humans from administering complex systems • Learning from the past • Modeling systems to render them amenable to analysis • Understanding how robust a system is • Robust = predictable behavior, graceful degradation • Equivalent: Figuring out how to make a system robust
Introspection! • Everyone gives an example of a self-* aspect from their research/experience • Arjun: e-commerce applications • Amitayu: dynamic allocation of servers in a farm • Ross: Ross’s sensor n/w • Huajing: information ret/ feedback • Young: fault handling by duplication • Krishna: activity migration in a multiprocessor
Goals of the course • Understand classic literature • Identify theory and systems issues/tools common across these diverse domains • Statistical learning, control theory, measurement techniques, data analysis, fault tolerance, modeling • I will try to have some guest lectures • Learn to appreciate how theory translates into and compares with practice • Critically evaluate papers and present them, use these in research
Grading policy • Paper presentations: 30% • Class participation and discussion: 15% • Lets have lots of heated discussions • Don’t be shy! • Paper evaluations due before class: 15% • A conference-style evaluation form • Semester-long project: 30% • May be replaced by a term paper • Apply ideas to your research, masters thesis • Final exam: 10% • Take-home exam
Expected course-load • No intentions of stressing you out! • Round-robin presentation policy • Number of presentations will depend on how many students enroll • Red-teams: To make sure you come prepared • We DON’T want bad presentations! • Mid-term and final presentations for students doing projects • End-of-semester take-home exam • Goal: Find out what we learnt in the course
Presentations • Prepare about 45-min long talk • Rest of the class for discussions • We will accept or reject papers at the end of each class • Red team • Each presenter will practice his/her talk with the assigned red team before the class • You are welcome to talk to me, discuss slides, ask for help understanding the paper before presenting it • Use the powerpoint template on course page • We will try to become good speakers and reviewers!
Paper evaluations • Due the midnight before the class • I will put up an evaluation format that you will adhere to • No long essays needed • Be critical, read the papers carefully • I will anonymize evaluations and put them up after the class so all can read them • Acceptable: txt, pdf
Course project • Not compulsory • You may work in groups of up to 2 students • You may replace it with a term paper • Survey of additional reading material • Project may be • A theoretical exercise • Implementation-based • A thought experiment • Report and term papers due at the end of the semester
Final exam • Day-long take-home exam • For students doing projects, I will design questions related to their project • For students doing a survey, I will design questions based on their survey report
Miscelleneous • Please register soon so the course can be offered • At least 5 students need to take the course • Lets figure out course timings suitable to all • Random thoughts • Would you like to solve puzzles? • Would you like to have discussions on systems research in general, hot areas, top conferences …? • Would you like to take turns as scribes? • Hope: We will learn a lot and have lots of fun in this course