1 / 23

Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Observe-Analyze-Act Paradigm for Storage System Resource Arbitration. Li Yin 1 Email: yinli@eecs.berkeley.edu Joint work with: Sandeep Uttamchandani 2 Guillermo Alvarez 2 John Palmer 2 Randy Katz 1 1:University of California, Berkeley 2: IBM Almaden Research Center. Outline.

Melvin
Download Presentation

Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Observe-Analyze-Act Paradigm for Storage System Resource Arbitration Li Yin1 Email: yinli@eecs.berkeley.edu Joint work with: Sandeep Uttamchandani2 Guillermo Alvarez2 John Palmer2 Randy Katz1 1:University of California, Berkeley 2: IBM Almaden Research Center

  2. Outline • Observe-analyze-act in storage system: CHAMELEON • Motivation • System model and architecture • Design details • Experimental results • Observe-analyze-act in other scenarios • Example: network applications • Future challenges

  3. E-mail Application Web-server Application Data Warehousing SLAGoals Storage Virtualization (Mapping Application-data to Storage Resources) Need for Run-time System Management • Static resource allocation is not enough • Incomplete information of the access characteristics: workload variations; change of goals • Exception scenarios: hardware failures; load surges.

  4. Approaches for Run-time Storage System Management • Today: Administrator observe-analyze-act • Automate the observe-analyze-act: • Rule-based system • Complexity • Brittleness • Pure feedback-based system • Infeasible for real-world multi-parameter tuning • Model-based approaches • Challenges: • How to represent system details as models? • How to create/evolve models? • How to use models for decision making?

  5. Host 1 Host 2 Host n Throttling Points QoS DecisionModule InterconnectionFabric Switch m InterconnectionFabric Switch 1 controller controller System Model for Resource Arbitration • Input: • SLAs for workloads • Current system status (performance) • Output: • Resource reallocation action (Throttling decisions) controller

  6. Managed System Current States Designer-defined Policies Retrigger Piece-wiseLinear Programming Component Models Models WorkloadModels ReasoningEngine ActionModels KnowledgeBase Our Solution: CHAMELEON Observe Analyze Act Incremental ThrottlingStep Size Current States Feedback Throttling Value ThrottlingExecutor

  7. Knowledge Base: Component Model • Objective:Predict service time for a given load at a component (For example: storage controller). Service_timecontroller = L( request size, read write ratio, random sequential ratio, request rate) • An example of component model • FAStT900, 30 disks, RAID0 • Request Size 10KB, Read/Write Ratio 0.8, Random Access

  8. Component Model (cont.) • Quadratic Fit • S = 3.284, r = 0.838 • Linear Fit • S = 3.8268, r = 0.739 • Non-saturated case: Linear Fit • S = 0.0509, r = 0.989

  9. Knowledge Base: Workload Model • Objective:Predict the load on component i as a function of the request rate j • Example: • Workload with 20KB request size, 0.642 read/write ratio and 0.026 sequential access ratio Component_loadi,j= Wi,j( workload j request rate)

  10. Knowledge Base: Action Model • Objective: Predict the effect of corrective actions on workload requirements • Example: Workload J request Rate = Aj(Token Issue Rate for Workload J)

  11. Analyze Module: Reasoning Engine • Formulated as a constraint solving problem • Part 1: Predict Action Behavior: For each candidate throttling decision, predict its performance result based on knowledge base • Part 2: Constraint Solving: Use linear programming technique to scan all feasible solutions and choose the optimal one

  12. latency WorkloadRequest Rate Token Issue Rate load Workload Model ComponentLoad Workload Request Rate Reasoning Engine: Predict Result • Chain all models together to predict action result • Input: Token issue rate for each workloads • Output: Expected latency Action Model Workload 1 Component Model Workload n

  13. FAILED EXCEED MEET LUCKY Reasoning Engine: Constraint Solving Latency(%) • Formulated using Linear Programming • Formulation: • Variable: Token issue rate for each workload • Objective Function: • Minimize number of workloads violating their SLA goals • Workloads are as close to their SLA IO rate as possible • Example: • Constraints: • Workloads should meet their SLA latency goals 1 1 IOps(%) 0 Minimize ∑paipbi[ SLAi – T(current_throughputi, ti)] SLAi where pai= Workload priority pbi = Quadrant priority

  14. Yes No Execute Reasoning Engine Output with Step-wise Throttling Proportional toConfidence Value Execute DesignerPolicies Act Module: Throttling Executor ReasoningEngine Invoked • Hybrid of feedback and prediction • Ability to switch to rule-based (policies) when confidence value is low • Ability to re-trigger reasoning engine Confidence Value < Threshold Re-triggerReasoningEngine Continue Throttling Analyze System States

  15. Experimental Results • Test-bed configuration: • IBM x-series 440 server (2.4GHz 4-way with 4GB memory, redhat server 2.1 kernel) • FAStT 900 controller • 24 drives (RAID0) • 2Gbps FibreChannel Link • Tests consist of: • Synthetic workloads • Real-world trace replay (HP traces and SPC traces)

  16. Experimental Results: Synthetic Workloads • Effect of priority values on the output of constraint solver • Effect of model errors on output of the constraint solver (a) Equal priority (b) Workload priorities (c ) Quadrant priorities (a) Without feedback (b) with feedback

  17. Experiment Result: Real-world Trace Replay • Real-world block-level traces from HP (cello96 trace) and SPC (web server) • A phased synthetic workload acts as the third flow • Test goals: • Do they converge to SLAs? • How reactive the system is? • How does CHAMELEON handle unpredictable variations?

  18. Real-world Trace Replay • With throttling • Without CHAMELEON:

  19. Real-world Trace Replay • Handling system changes • With periodic un-throttling

  20. 1: monitor network/server behaviors 4: Execute or Distribute actiondecision to execution points 2: establish/maintain modelsfor system behaviors 3: Analyze and make actiondecision Other system management scenarios? • Automate the observe-analyze-act loop for other self-management scenarios • Example: CHAMELEON for network applications • Example: A proxy in front of server farm

  21. Future Work • Better methods to improve model accuracy • More general constraint solver • Combining with other actions • CHAMELEON in other scenarios • CHAMELEON for reliability and failure

  22. References • L. Yin, S. Uttamchandani, J. Palmer, R. Katz, G. Agha, “AUTOLOOP: Automated Action Selection in the ``Observe-Analyze-Act’’ Loop for Storage Systems”, submitted for publication, 2005 • S. Uttamchandani, L. Yin, G. Alvarez, J. Palmer, G. Agha, “CHAMELEON: a self-evovling, fully-adaptive resource arbitrator for storage systems”, to appear in USENIX Annual Technical Conference (USENIX’05), 2005 • S. Uttamchandani, K. Voruganti, S. Srinivasan, J. Palmer , D. Pease, “Polus: Growing Storage QoS Management beyond a 4-year old kid”, 3rd USENIX Conference on File and Storage Technologies (FAST’04), 2004

  23. Questions? Email: yinli@eecs.berkeley.edu

More Related