1 / 29

Cross-Layer Optimizations between Network and Compute in Online Services

This research overview discusses the challenges and opportunities in optimizing network and compute in online services, focusing on the TimeTrader architecture that leverages per-sub-query slack to save energy. Results show a 15%-40% energy savings at peak-30% loads.

lyndaw
Download Presentation

Cross-Layer Optimizations between Network and Compute in Online Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cross-Layer Optimizations between Network and Compute in Online Services Balajee Vamanan

  2. Research Overview • Networking • Packet classification • EffiCuts (SIGCOMM ’10), TreeCAM (CoNEXT ‘11) • Datacenter networks • D2TCP (SIGCOMM ‘12), FlowBender (CoNEXT ‘14) • Software-defined networking • Hydra* • Architecture • Non-volatile memory • MigrantStore* • Energy efficiency • TimeTrader(MICRO ‘15)

  3. Searching large data is becoming prevalent • Data is growing exponentially • Online Search (OLS): Interactively query and access data • Key component of many Online Services • e.g., Google Search, Facebook, Twitter, advertisements Complex, Unstructured Relational

  4. OLS software architecture Query Query result • Highly distributed and hierarchical • Partition-aggregate (tree) • Request-compute-reply • Request: root leaf • Compute: leaf node • Reply: leaf root • Root waits until deadline to generate overall response • Some replies may take longer in network or compute • Network queuing  TCP retransmissions • Imperfect sharding non-uniform compute • Long tails  tail latency problem Request Response Root Agg m Agg 1 Leaf 1 Leaf n Leaf 1 Leaf n

  5. OLStail latencies and SLA budgets • Missed replies due to tail latency affect quality • e.g., SLA of 1% missed replies (deadlines) • Time budgets based on 99th %-ile latency of sub-queries • To optimize network and compute independently • Split budget into network and compute • e.g., total = 250 ms  flow deadlines 20/30 ms 250 50 2030 50 2030 50 OLS applications operate under tight network and compute budgets

  6. Outline • Introduction • Background on OLS • Challenges and opportunities • TimeTrader: key ideas • TimeTrader’s design • Methodology • Results

  7. Challenge #1: Network tail latency • OLS handles large data  high fan-in trees • Children respond around same time  Incast • Incast  Packet drops  Increase tail latency • Hard to absorb in buffers • Causes long network tails  poor interactivity/quality Network must prioritize based on deadlines

  8. Solution #1: D2TCP Deadline-aware datacenter TCP (D2TCP) [SIGCOMM ’12] • A TCP variant that is optimized for OLS • High level idea: During congestion, • Near deadline flows: back-off less • Far deadline flows: back-off more • Achieves distributed earliest-deadline first (EDF) scheduling in the network • Evaluated in Google datacenters • Not the focus of this talk

  9. Challenge #2: Energy management Traditional energy management does not work for OLS • Interactive, strict service-level agreements (SLAs) • Cannot batch • Short response times and inter-arrival times (e.g., 200 ms and ≈1 ms for Web Search) • Low-power modes (e.g., p-states) not applicable • Shard data over 1000s of servers  each query searches all servers • Cannot consolidate workload to fewer servers Must carefully slow down (not turn off) servers without violating SLAs

  10. Solution #2: TimeTrader • TimeTrader [MICRO ‘15] • Idea: Carefully modulate the speed of servers as per each query’s needs • Critical query  run at full speed • Sub-critical query  run at slower speed • Learn criticality information from both network and compute • Prior work [Pegasus – ISCA ‘14] considers average datacenter load to slow down TimeTrader is the focus of this talk

  11. Outline • Introduction • Background on OLS • Challenges and opportunities • TimeTrader: key ideas • TimeTrader’s design • Methodology • Results

  12. TimeTrader: contributions Each query leads to 1000s of sub-queries Each query’s budget set to accommodate 99th %-ile sub-query • In both network and compute parts • Key observation: 80% sub-queries complete in 20% of budget • TimeTrader exploits both network and compute slack to save energy

  13. TimeTrader: contributions (cont.) • Uses(local) per-sub-query latencies reshapes sub-query response time distribution at all loads Baseline peak load Pegasus (low load) Baseline low load TimeTrader (all load) Fraction of sub-queries Deadline (99% at peak) A B Latency Pegasus’s opportunity TimeTrader’s opportunity A: median baseline at low loadB: 99% baseline at low load

  14. TimeTrader: contributions (cont.) • Leverages network signals to determine slack • Does not need fine-grained clock synchronization • Employs Earliest Deadline First (EDF) scheduling • Slowing down a sub-query affects other queued sub-queries • Critical sub-queries affected the most • Allows critical sub-queries to bypass others TimeTrader is the first to explore cross-layer optimization between network and architecture layers; saves 15% - 40% energy at peak - 30% loads

  15. Outline • Introduction • Background on OLS • Challenges and opportunities • TimeTrader: High level ideas • TimeTrader’s design • Methodology • Results

  16. TimeTrader: design • TimeTrader exploits per-sub-query slack • Sources of slack • Network • Request • Reply (after compute, unpredictable) • Compute • Actual compute (query dependent) • Queuing (due to local load variations) • Determine (a) request slack and (b) compute slack • Slow down based on slack and load • Shield critical sub-queries from slowed down sub-queries • EDF implementation • Intel’s Running Avg. Power Limit (RAPL) to set power states

  17. 1(a): Determine request slack • Requests traverse network from root to leaf • Timestamp at parent and leaf? • Clock skew ≈ slack • Estimate based on already-available network signals • e.g., Explicit Congestion Notification (ECN) marks, TCP timeouts If a leaf doesn’t see either ECN or Timeouts Request Slack = Network Budget – Median Latency Else Request Slack = 0

  18. 1(b): Determine compute-queuing slack • Exploit local load variations in each leaf server • Slack calculation • Details in the paper Total Slack = Request Slack + Compute-Queuing Slack

  19. 2: Slow down based on slack • Given the slack, calculate slow down factor • As a fraction of compute budget • But not all of the slack can be used • Slack calculation assumes no queuing • Need to scale slack based on load Slowdown factor = Total Slack * Scale / Compute Budget • Scaledepends on load • Feedback controller (details in the paper)

  20. 3: Shield critical sub-queries • A sub-critical request could hurt another critical request • If critical request is queued behind sub-critical request • EDF decouples critical and sub-critical • We compute the deadline at the leaf node as follows: Deadline = Compute Budget + Request Slack • Key: determining slack enables EDF Implementation details in the paper

  21. Outline • Introduction • Background on OLS • Challenges and opportunities • TimeTrader: High level ideas • TimeTrader’s design • Methodology • Results

  22. Methodology • Compute • Real measurements for service time, power • Network • Real measurements at a small scale • Tails effects only in large clusters  simulate with ns-3 Workload: Web Search (Search) from CloudSuite 2.0 • Search index from Wikipedia • 3000 queries/s at peak load  90% load • Deadline budget: Overall 125 ms • Network (request/reply): 25 ms • Compute: 75 ms • SLA of 1% missed deadlines

  23. Idle and Active Energy Savings TimeTrader saves 15% - 40% energy over baseline (17% over Pegasus at 30% load)

  24. Power state distribution • Even at the peak load, TimeTrader slows down about 80% of the time using per-query slack

  25. Response time distribution

  26. Response time distribution

  27. Response time distribution TimeTrader reshapes distribution at all loads; Slows ~80% of requests at all loads

  28. Conclusion TimeTrader… • Exploits sub-query slack • Reshapes response time distribution at all loads • Saves 15% energy at peak, 40% energy at 30% load • Leverages network signals to estimate slack • Employs EDF to decouple critical sub-queries from sub-critical sub-queries TimeTrader converts the performance disadvantage of latency tail into an energy advantage!

  29. Thank you • Collaborators • Jahangir Hasan (Google Inc.) • Hamza Bin Sohail (Apple Inc.) • T. N. Vijaykumar (Purdue University) • Sanjay Rao (Purdue University) • Funding • Google Faculty Research Awards Questions?

More Related