210 likes | 345 Views
SLO-aware Hybrid Store. Priya Sehgal , Kaladhar Voruganti, Rajesh Sundaram 8 th March 2013 MSST 2012: http://storageconference.org/2012/Papers/21.Short.2.SLOAware.pdf. Introduction. What is SLO? Service Level Objective Specification of application requirements
E N D
SLO-aware Hybrid Store Priya Sehgal, Kaladhar Voruganti, Rajesh Sundaram 8th March 2013 MSST 2012: http://storageconference.org/2012/Papers/21.Short.2.SLOAware.pdf University Day 2013
Introduction • What is SLO? • Service Level Objective • Specification of application requirements • Technology-independent • Examples • Performance : Average I/O latency, throughput • Capacity • Reliability • Security, etc. University Day 2013
Problem and Motivation • Assumptions: • SLO = Latency (ms) • W1 and W2 working on different data sets • SSD tier used for read caching only • Problems: • SLO inversion • SLO violation • Sub-optimal SSD Utilization W1 on vol1, SLO=5ms W2 on vol2, SLO=10ms HyS1 SSD Array (Read Cache) HDD Array (Permanent Store) Hybrid Store Need: Bring SLO-awareness to SSD caching (read) in HyS University Day 2013
Solution – Trailer Per-workload cache partitioning and dynamic cache sizing University Day 2013
Architecture W1 Controller Communicate w/ upper layer Master Controller SSD size increase/decrease 4 W2 Controller 2 5 3 Wn Controller 6 6 Store Hybrid Store 1 SSD size SSD size Monitoring daemon Eviction Engine1 Eviction Engine2 SLO Stats Observed latency Expected Latency HyS SSD stats SSD hit% SSD Utilization Eviction Engine n University Day 2013
Controller Design Space Dimensions Un-partitioned cache vs. Partitioned Static vs. Dynamic Partition Cache size decrement decision: max threshold vs. min-max range University Day 2013
When to decrease the cache size? • Decrease partition size as soon as SLO is met • Oscillations due to unnecessary evictions • Decrease when observed latency goes below a minimum threshold • Range of operation (min and max threshold) • Helps stabilize cache size and avoid unnecessary evictions • Range varies depending upon the SLO and cache size • Percentage conformance University Day 2013
Decrease Partition Size Problem Decrease in partition size leading to SLO violations University Day 2013
Insights on range of operation University Day 2013
a. SLO Target = 99 percentile b. SLO Target = 90 percentile c. SLO Target = 80 percentile d. SLO Target = 75 percentile University Day 2013
Error-aware Feedback Controller (EAFC) Kpinc , Kpdec Controller (Kp * e) Kp : 0 or Kpinc or Kpdec Cwindow szold * (1+ kp * e) Eviction Engine szold * (1 - kp * e) SLO range Error (e) ∑ SSD utilization %, SSD hit % 75th percentile Observed Latency SLO violation Within SLO range N N e = lmin - lobs Y Y e = lmax- lobs e = 0 University Day 2013
Test Description • Hybrid Store prototype • 1 workload per volume • 1 workload: All I/Os coming to a volume • HDD Space: 1TB • SSD Space: 160GB • RAM Size: 16 GB • Test1 – SPECsfs 2008-like: • No. of threads = 20 • Load/thread = 250 IOPS 5000 IOPS • Total dataset size = 600 GB • Target Latency = 8ms, 3ms University Day 2013
SPECsfs 2008 (SLO target = 8ms) Amount of SSD read cache required for meeting 8ms latency = 15 GB University Day 2013
SPECsfs 2008 (SLO target = 3ms) Amount of SSD read cache required for meeting 3ms latency = 35 GB (2.3x more than 8ms target latency) University Day 2013
SPECsfs 2008 Tests • Filer Setup same as before • Test1: • No. of threads = 20 • Load/thread = 250 IOPS 5000 IOPS • Total dataset size = 600 GB • Target Latency = 8ms, 3ms • Test2: • No. of threads = 20 • Load/thread = varying from 100, 200, 250 2000, 4000 and 5000 IOPS (each run for 5 hrs) • Target latency = 3ms University Day 2013
SPECsfs 2008 Test 2 (Varying loads) University Day 2013
Conclusion • Insights • It is not necessary to cache whole WSS to meet certain latency targets • 75th percentile SLO conformance yields close to optimal SSD size meeting average SLO almost all the time • A cache sizer needs only few 100 history points to set the appropriate SSD size light-weight • Objectives Met • SLO met close to 100% • With close to optimal amount of SSD • Improving SSD utilization • Without much computation and memory overheads University Day 2013
Backup Slides University Day 2013
References [1] J.Chang and G.S. Sohi, “Cooperative cache partitioning for chip multiprocessors”, In Proc. ICS’07, 2007. [2] S.Kim, D. Chandra, and Y.Solihin, ”Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture”, In Proc. PACT’04, 2004. [3] R.Lyer, “QoS: A framework for enabling qos in shared caches cmp platforms”, In Proc.ICS’04, pp.257-266, 2004. [4] H. S. Stone, J. Turek, and J.L. Wolf, “Optimal partitioning of cache memory”, IEEE Transactions on Computers., 41(9), 1992. [5] G. E. Suh et al., “Dynamic partitioning of shared cache memory”, In Journal of Supercomputing, 28(1), 2004. [6] M. K. Qureshi and Y.N. Patt, “Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches”, In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. [7] PawanGoyal, et al., “CacheCOW: QoS for Storage System Cache”, In Eleventh International Workshop on Quality of Service, 2003. University Day 2013