1 / 23

Heracles: Improving Resource Efficiency at Scale

Heracles: Improving Resource Efficiency at Scale. ISCA’15 Stanford University Google, Inc. Outline. Introduction Design Isolation Mechanisms Controllers Evaluation Conclusion. Motivation. Average server utilization in most datacenter is low, ranging between 10%~50%.

alaughlin
Download Presentation

Heracles: Improving Resource Efficiency at Scale

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc.

  2. Outline • Introduction • Design • Isolation Mechanisms • Controllers • Evaluation • Conclusion

  3. Motivation • Average server utilization in most datacenter is low, ranging between 10%~50%. • Difficult to consolidate the latency-critical services on a subset of highly utilized servers. • Increase the server utilization by launching best-effort tasks on the same server with a latency-critical job.

  4. Motivation(Cont.) • Previous works tend to protect LC workloads, but reduce the opportunities for higher utilization through co-location.

  5. Goal • Eliminate SLO violations at all levels of load for the LC job while maximizing the throughput for BE tasks.

  6. Heracles • A real-time, feedback-based controller • Enables the safe co-location of best-effort(BE) tasks alongside a latency-critical(LC) service. • Ensures that LC jobs meet their target while maximizing the resources given to BE tasks.

  7. Heracles(Cont.) • Four hardware and software isolation mechanisms. • Hardware: shared cache partitioning, fine-grained power/frequency setting. • Software: core isolation, network traffic control.

  8. Isolation Mechanisms(Soft) • Core isolation • Pin workload to a set of core using cpusetcgroups. • Speed of (re)allocation: tens of milliseconds. • Network traffic • Limit the outgoing bandwidth of BE tasks using Linux traffic control. • No limit on LC job. • Take effect in less than hundreds of milliseconds.

  9. Isolation Mechanisms(Hard) • LLC isolation • Cache Allocation Technology(CAT)in recent Intel chip. • Use way-partitioning to define non-overlapping partitions on LLC. • Take effect in a few milliseconds. • Implement software monitor to track the bandwidth usage of LC and BE jobs. • Scale down the # of cores for BE jobs if LC jobs does not receive sufficient bandwidth.

  10. Isolation Mechanisms(Hard)(Cont.) • Power isolation • CPU frequency monitoring, Running Average Power Limit(RAPL), and per-core DVFS. • Take effect within a few milliseconds.

  11. Design Approach • An optimization problem • Maximize utilization with the constraint that the SLO must be met. • Heracles • decomposes the high-dimensional optimization problem into many smaller and independent problem. • Decoupling interference sources. • Monitors latency, latency slack, and load. • Adjust the BE job allocation.

  12. System Diagram

  13. High-level Controller

  14. Core & Memory Sub-controller

  15. Max Load under SLO

  16. Power and Network Sub-controller

  17. Evaluation • Two sets of experiments • Co-locates LC applications with BE tasks on a single server. • Measuring end-to-end latency of Websearch on tens of servers. • BE tasks are also running. • Effective Machine Utilization(EMU) • LC throughput + BE throughput

  18. Workloads • Three Google production LC workloads: • websearch • ml_cluster • Real-time text clustering using machine learning • memkeyval • In-memory key-value store • Run LC workloads with benchmarks that stress a single shared resource. • Stream-LLC, Stream-DRAM,cpu-pwr, iperf, brain, and streetview.

  19. Latency of LC Applications

  20. EMU

  21. Shared Resource Utilization

  22. Websearch in Cluster

  23. Conclusion • Heracles • a heuristic feedback-based system that manages four isolation mechanisms to enable a latency-critical workload to be co-located with batch jobs without SLO violations. • Evaluation on real hardware demonstrates an average utilization of 90% across all evaluated scenarios without any SLO violations for the latency-critical job.

More Related