370 likes | 403 Views
Generic Adaptive Control. Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 http://www.research.ibm.com/PM. Participants. Research Joe Bigus (ABLE) Markus Debusman (University of Applied Science, Wiesbaden Germany) Yixin Diao Frank Eskesen
E N D
Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 http://www.research.ibm.com/PM
Participants • Research • Joe Bigus (ABLE) • Markus Debusman (University of Applied Science, Wiesbaden Germany) • Yixin Diao • Frank Eskesen • Steve Froehlich • Joe Hellerstein • Alexander Keller • Xue Lui (Univ. of Illinois) • Sujay Parekh • Lui Sha (Univ. of Illinois) • Maheswaran Surendra (team lead) • Dawn Tilbury (Univ. of Michigan) • DB2 • Randy Horman • Matt Huras • Ed Lassettre • Sam Lightstone • Kevin Rose • Adam Storm • WebSphere • Carolyn Norton • HVWS • Noshir Wadia • Eric Ye • Server Group • Lisa Spainhower
URL Cache EJB threads JVM heap size Servlet reload int MaxClients Number of Threads DB Connections KeepAlive TImeout Fast response cache MaxRequestsPerChild ThreadsPerChild Max simultan. requests ListenBackLog Challenges: Skill shortage Multiple vendors, multiple standards Mapping policies to IT “knobs” Administrator Example: Configuration & Optimization in WebSphere Web Servers End Users Application Servers
Project Goals • Develop a formal basis for resource management problems with dynamics (especially policy enforcement) • Demonstrate the practical value of the approach • Evangelize the approach • Book, tutorials, classes • Methodology and tools
Agenda • Basics of Control Theory • Regulating concurrent users in Lotus Notes: pole placement design • Regulating utilizations in Apache • Optimizing response times in Apache • Throttling DB2 utilities • DB2 self-tuning memory • Regulating service levels in a multi-tiered eCommerce system (HotRod) • Educational efforts (book, tutorials) • Summary
AutoTune Agent K=.1 K=1 K=5 Uncontrolled Slow Better Bad Control of Lotus Notes eMail Server Workload generator RPCs Administrator MaxUsers Lotus Notes Server Target Queue Length Measured Queue Length
MaxUsers Notes Server Actual Queue Length Dynamic model 100 80 Predicted QL 60 40 20 0 0 20 40 60 80 100 Observed QL System Identification:Estimate Transfer Function
H(z) = Closed Loop Transfer Function + Controller G(z) Notes Server N(z) Sensor S(z) - Design for “poles” of H(z) Simplified Integral Control Law K=5 K=1 Controller Design
Workload generator AutoTune Agent Web Service requests Administrator MaxClients, KeepAlive TO Apache System Policies & Reports CPU Utilization, Memory Utilization Control of Apache Server Contribution: Multiple Input, Multiple Output
LEGEND HTTP Inter-Process Value flow Shared Mem Process MaxClients KeepAlive SvcTime stats Get/Set interface Internal Controller mod_controller (close-up) Apache Control Enablements OS (procfs) Web Server CPU util Mem util Master External Controller GET/SET KILL SPAWN mod_controller Worker Procs RT info External RT Probe
G11 G11 S + + + + 0 SISO approach assumes cross terms are negligible G21 G21 MIMO model SISO vs. MIMO 0 G12 G12 + + G22 G22 + + S Model Structure The Transfer Function Relationship G11 KA CPU Two SISO models G22 MC MEM Apache Server
MIMO Model CPU CPU MEM MEM KA KA MC MC Time (s) Time (s) Model Comparison Model Prediction Two SISO Models CPU: SISO model fails because MC and KA both affect CPU, MIMO model is able to capture this relationship MEM: Both models do a good job of predicting system response
Optimization of Apache Server Workload generator AutoTune Agent Web Service requests MaxClients Apache System Response Time
Apache Operation New Users Close() Timeout() + New conn MaxClients TCP Accept queue Apache Heuristic: Find the smallest MaxClients that eliminates TCP queueing
Apache Defaults Impact of MaxClients Response Time MaxClients
d/dt Inference mechanism Fuzzy Controller Fuzzification Defuzzification Rule base AutoTune Using Fuzzy Rules • Fuzzification • Convert numeric variables to linguistic variables • Characterized by membership functions • Rule base • IF-THEN rules • Using linguistic variables • Inference mechanism • Activate the fuzzy rules (IF) • Combine the rule actions (THEN) • Defuzzification • Convert linguistic variables to numeric variables
Constructing Fuzzy Rules Rule 3 Rule 1 • Decision making: • Increment direction • Increment size Response Time (RT) Rule 4 Rule 2 MaxClients • Rule 1:IF change-in-MaxClients is poslarge and change-in-RT • is neglarge THEN next-change-in-MaxClients is poslarge • Rule 2:IF change-in-MaxClients is neglarge and change-in-RT • is poslarge THEN next-change-in-MaxUsers is poslarge • Rule 3:IF change-in-MaxClients is neglarge and change-in-RT • is neglarge THEN next-change-in-MaxUsers is neglarge • Rule 4:IF change-in-MaxClients is poslarge and change-in-RT • is poslarge THEN next-change-in-MaxUsers is neglarge
Apache default Optimized setting AutoTune Controlling MaxClients on Apache
New optimized setting Old optimized setting AutoTune Response to a new workload Workload changes
DB2 UDB Utilities Throttling (SMART Project) Target Utilization Backup Disk, CPU Utilizations Restore UDB Engine Re-Balance Sleep Delay Server
Success Is: Small Effect on User Throughput High System Utilization Gap due to reduced utilization in sleep periods 1 % Utilization Time Note: This is a longer-time averaged value than on slide 5.
Workload b b Utility U a U Y + a DB2 Throttling a Single Utility • Standard PI controller tries to reach E=0 • Assume: linear effect of throttling on Y Parameters characterizing DB2 Control error Max thruput from utility + workload Thruput degradation
Baseline Measurement: idling P1 Time P2 P3 • “Start” is perf output after all Pi have read new control value. • “End” is from closest output to control change Start1 End1 Start2 End2 Control Points “Loop” Throughput “Other” (Sleep) Throughput
p s 1 Baseline Estimation • Over time, record sequence {(ti, pi, si)} • t = Time • p = Perf at time t • s = SleepPct at time t • Fit a “curve” to this data, to get model M • E.g., Over some fixed time interval of the past
Control with disturbance Large Disturbance Small Disturbance • Baseline estimation needs work • Cannot adjust to large workload change • Controller response still OK
Few minutes later… Dynamic Surge Protection Systems can go from steady state … Internet • tooverloaded without warning
Resource Actions With Lead Times • Definition of lead time: • Delay from request to action taking effect • Examples • From provision a server to its servicing requesting • From de-provision a server to its being returned to a free pool • From increase size of a buffer pool to pool is filled with data
Leadtime Effect of Lead Times on WAS Provisioning
Leadtime Benefits of Proactive Provisioning
Solution Manager On-Line Capacity Planning Adaptive Forecasting On-Demand Actions HVWS Performance Modeler A Controller Forecaster Plan Analyze On-Demand Actions Deployment Manager M M Execute E E Configuration Management BOPS Monitoring P P Monitor Knowledge Sensors Effectors 3 Element S Workload A A 2 DB2 v8.1 #WAS 1 WAS 5.0 RT Application E E Autonomic Computing: Dynamic Surge Protection
CeBit Press Reuters: IBM: Software Can Predict Computer Demand C/Net: IBM offers details on autonomic software InfoWorld: IBM to show new autonomic suite at CeBIT IDG News: IBM to show off new autonomic technology InformationWeek: More Autonomic Capabilities From IBM InternetNews:IBM Spruces Up Autonomic Computing Offerings cw360.com: IBM to demo autonomic technology at CeBIT
Control Theory Book • Feedback Control of Computing Systems • Wiley-Interscience • Intended audience • Computer scientist with minimal math background (geometric series) who want to apply techniques to practical problems • Control theorist looking for new applications • Status • 10 of 11 chapters at a “beta” level • Expected completion by end of June • Publication in 2004
Table of Contents • Introduction (Qualitative control theory) • Model construction (statistics) • Z-Transforms and transfer functions (component models) • Block diagrams (system models) • First order systems • Higher order systems • State space models (multi-variate models) • Proportional control (feedback basics) • Other classical controllers (PID, tuning controllers) • State space feedback control (MIMO) • Advanced topics
Progress Towards Project Goals • Develop/identify a formal approach • Control theory based • Demonstrate value • Lotus Notes – control w/o instabilities • Apache – simple way to optimize tuning parameters • DB2 Utilities Throttling HotRod – handling resource actions with dead times • HotRod prototype – resource actions w/lead times • Evangelize • Feedback Control of Computing Systems, Wiley-Interscience • Tutorials: Almaden, Integrated Management, Stanford/Berkeley • Classes: Columbia?, University of Michigan? • AC toolkit integration
"Using Control Theory to Achieve Service Level Objectives in Performance Management," S Parekh, N Gandhi, JL Hellerstein, D Tilbury, TS Jayram, J Bigus, Real Time Systems Journal, 2002. "Feedback Control of a Lotus Notes Server: Modeling and Control Design," N. Gandhi, S. Parekh, J. Hellerstein, and D.M. Tilbury, American Control Conference, 2001. (Best paper in session.) "An Introduction to Control Theory With Applications to Computer Science," JL Hellerstein and S Parekh, ACM Sigmetrics, 2001. Using MIMO Feedback Control to Enforce Policies for Interrelated Metrics With Application to the Apache Web Serve," Y Diao, N Gandhi, JL Hellerstein, S Parekh, and DM Tilbury. Network Operations and Management, 2002. (Best paper in conference.) "MIMO Control of an Apache Web Server: Modeling and Controller Design," Y Diao, N Gandhi, JL Hellerstein, S Parekh, and DM Tilbury, American Control Conference, 2002. (Best paper in session.) "Using Fuzzy Control to Maximize Profits in Service Level Management," Y Diao, JL Hellerstein, S Parekh. Accepted to the IBM Systems Journal, 2002. "A First-Principles Approach to Constructing Transfer Functions for Admission Control in Computing Systems," JL Hellerstein, Y Diao, and S Parekh. Conference on Decision and Control, 2002. "Generic On-Line Discovery of Quantitative Models for Service Level Management," Y Diao, F Eskesen, S Froehlich, JL Hellerstein, A Keller, L Spainhower, and M Surendra, IFIP Symposium on Integrated Management, 2003. On-Line Response Time Optimization of An Apache Web Server," Yixin Diao, Xue Lui, Steve Froehlich, Joseph L Hellerstein, Sujay Parekh, and Lui Sha. To appear in International Workshop on Quality of Service, 2003. http://www.research.ibm.com/PM