300 likes | 381 Views
Heracles: Improving Resource Efficiency at Scale. ISCA’15 Stanford University Google, Inc. Outline. Introduction Design Isolation Mechanisms Controllers Evaluation Conclusion. Motivation. Average server utilization in most datacenter is low, ranging between 10%~50%.
E N D
Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc.
Outline • Introduction • Design • Isolation Mechanisms • Controllers • Evaluation • Conclusion
Motivation • Average server utilization in most datacenter is low, ranging between 10%~50%. • Difficult to consolidate the latency-critical services on a subset of highly utilized servers. • Increase the server utilization by launching best-effort tasks on the same server with a latency-critical job.
Motivation(Cont.) • Previous works tend to protect LC workloads, but reduce the opportunities for higher utilization through co-location.
Goal • Eliminate SLO violations at all levels of load for the LC job while maximizing the throughput for BE tasks.
Heracles • A real-time, feedback-based controller • Enables the safe co-location of best-effort(BE) tasks alongside a latency-critical(LC) service. • Ensures that LC jobs meet their target while maximizing the resources given to BE tasks.
Heracles(Cont.) • Four hardware and software isolation mechanisms. • Hardware: shared cache partitioning, fine-grained power/frequency setting. • Software: core isolation, network traffic control.
Isolation Mechanisms(Soft) • Core isolation • Pin workload to a set of core using cpusetcgroups. • Speed of (re)allocation: tens of milliseconds. • Network traffic • Limit the outgoing bandwidth of BE tasks using Linux traffic control. • No limit on LC job. • Take effect in less than hundreds of milliseconds.
Isolation Mechanisms(Hard) • LLC isolation • Cache Allocation Technology(CAT)in recent Intel chip. • Use way-partitioning to define non-overlapping partitions on LLC. • Take effect in a few milliseconds. • Implement software monitor to track the bandwidth usage of LC and BE jobs. • Scale down the # of cores for BE jobs if LC jobs does not receive sufficient bandwidth.
Isolation Mechanisms(Hard)(Cont.) • Power isolation • CPU frequency monitoring, Running Average Power Limit(RAPL), and per-core DVFS. • Take effect within a few milliseconds.
Design Approach • An optimization problem • Maximize utilization with the constraint that the SLO must be met. • Heracles • decomposes the high-dimensional optimization problem into many smaller and independent problem. • Decoupling interference sources. • Monitors latency, latency slack, and load. • Adjust the BE job allocation.
Evaluation • Two sets of experiments • Co-locates LC applications with BE tasks on a single server. • Measuring end-to-end latency of Websearch on tens of servers. • BE tasks are also running. • Effective Machine Utilization(EMU) • LC throughput + BE throughput
Workloads • Three Google production LC workloads: • websearch • ml_cluster • Real-time text clustering using machine learning • memkeyval • In-memory key-value store • Run LC workloads with benchmarks that stress a single shared resource. • Stream-LLC, Stream-DRAM,cpu-pwr, iperf, brain, and streetview.
Conclusion • Heracles • a heuristic feedback-based system that manages four isolation mechanisms to enable a latency-critical workload to be co-located with batch jobs without SLO violations. • Evaluation on real hardware demonstrates an average utilization of 90% across all evaluated scenarios without any SLO violations for the latency-critical job.