1 / 23

Microservers and beyond: Pushing the boundaries of efficiency

Microservers and beyond: Pushing the boundaries of efficiency. Kevin Lim, Research Scientist HP Labs, Intelligent Infrastructure Lab. Embracing scale-out. New Data Center Construction Costs*. Now in scale-out computing era Datacenters with 100,000’s servers Billions of devices

alayna
Download Presentation

Microservers and beyond: Pushing the boundaries of efficiency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microservers and beyond: Pushing the boundaries of efficiency Kevin Lim, Research Scientist HP Labs, Intelligent Infrastructure Lab

  2. Embracing scale-out New Data Center Construction Costs* • Now in scale-out computing era • Datacenters with 100,000’s servers • Billions of devices • ZB’s of data moved and processed • Must alwaysdesign for scale • Microservers can provide significant efficiency benefits • Total cost of ownership at datacenter level • Metrics: Performance / W, Performance / $ $65B $30B • *IDC 2011: Market Analysis Perspective: Worldwide Datacenter Trends and Strategies 2010

  3. HP Labs and Microservers • Have long viewed microservers as critical building block • Multiple research publications since 2008 • When do they make sense? How to design servers? • Will cover three main works: • µBlades: Initial microserver exploration • Disaggregated memory: Addressing capacity needs • SoC architectures for hyperscale servers: System-level integration • And push microservers to an extreme: nanostores

  4. µBlades for the cloud • Focus on internet sector: fastest growing server market • Google, Amazon, MSN’s billion dollar data centers • Extreme scale: Millions of users on 100,000s of servers • Infrastructure and power/cooling some of largest expenses • Needs for system architecture research • Cloud application benchmarking • Whole-system costs analysis • Holistic system designs with compelling performance/cost

  5. Benchmarks and Metrics • New benchmark suite • Websearch • Unstructured data, large data sets • Webmail • User involvement, scale-out applications • Video sharing • Rich media, large streaming data • MapReduce • Cloud computing, internet as a platform • Key metric: Sustained performance / Total cost of ownership (Perf / TCO-$)

  6. Power & Cooling Hardware Cost Analysis of Baseline Servers Holistic approach must be taken to reduce costs

  7. µBlades: attack the inefficiencies (3) Remote shared memory blades and flash-based disk caching (1) Power-efficient, embedded-class low-cost processors (2) Compact packaging, aggregate cooling, enclosure optimizations

  8. µBlades: performance/TCO-$ 2.0X perf/TCO Relative to cost-optimized baseline

  9. Microservers and the memory capacity wall 30% less GB/core every 2 years • Very small physical space • Making things worse: • DRAM scaling slowing • Less modules close to cores • Non-linear $/GB curve • Growing need for memory: • Many-core scaling • Workload consolidation • In-memory DB/BI, DISC • Interactive web-scale workloads • Performance/cost impact $/GB (June 2008)

  10. Opportunity: optimizing for the ensemble Intra-server variation (TPC-H, log scale) Inter-server variation (rendering farm) • Use same concepts as microblades/servers! • Dynamic provisioning across ensemble enables cost & power savings Time

  11. Disaggregated memory contributions Goal: Expand capacity & provision for typical usage • New architectural building block: memory blade • Breaks traditional compute-memory co-location • Architectures for transparent memory expansion • Capacity expansion: • 8x performance over provisioning for median usage • Higher consolidation • Capacity sharing: • Lower power and costs • Better performance / dollar

  12. General memory blade design Perf.: Accessed as memory, not swap space Commodity: Connected via PCIe or HT Memory blade (enlarged) Backplane CPUs CPUs CPUs CPUs Protocol engine DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM Memory controller Address mapping Cost: Handles dynamic memory partitioning Transparency: Enforces allocation, isolation, and mapping Cost: Leverage sweet-spot of RAM pricing Other optimizations Break CPU-memory co-location Leverage fast, shared communication fabrics

  13. Disaggregated memory results Baseline: Mem_median Server consolidation 8X 60% 2X • Trace-based simulation (TPC, search/index, SpecJBB, SpecCPU) • Performance: 8X for memory-limited workloads, slightly worse than ideal • Consolidation: 60% server reduction (web 2.0 company traces) 13

  14. Case for integration: Server inefficiencies today xeon SB chipset NB chipset dimm A typical 2-socket blade server • Three-chip chipset (PCIe: 27W, IO: 7W) • CPU + chipset: > 200W (and ~$2k) • General purpose, un-optimized parts dimm dimm dimm xeon disk NIC $ ser/des

  15. Why slow adoption of server SoCs? • Aggressive SoCs are largely absent in general purpose CPUs • Mix & match one CPU chip with different peripherals • Different development timescales for CPU & peripherals • Integration trend: slower than embedded, but steady over time (and accelerating) • Today: mounting cost & energy pressure in datacenters demand technologies that first and foremost can decrease TCO • Energy  fewer pin crossings, more efficient blocks (heterogeneous) • Cost  fewer sockets, BoM, etc. • Creating a stronger case for SoC-based servers

  16. System-level integration for hyperscale servers • Goal: Quantify System-level Integration benefits • Energy reduction: remove expensive pin crossings to the I/O subsystem • Cost reduction: use of silicon density, riding the mobile volume commodity curve • For example, a 100 mm2 SoC at 16nm vs. the best non-SoC: • Reduces total silicon area by ~40%, dynamic chip power by ~30% • Reduces datacenter-level cost by ~20%, energy by ~35%, TCO by ~25% • TCO reduction range15-40%, >50% with network port aggregation

  17. Design space exploration complexity • Huge Design Space and Multilevel Design Point Chip Level: architecture, core, SoC components, resource provisioning, chip cost Board and Server Level: Socket planning, Inter-chip aggregation Datacenter Level: Intra-rack aggregation • Workload Variability Computation intensive vs. I/O intensive (Memory, Disk, Network) • Significant development of tools • Extended McPAT, paired with TCO models Partially integrated SoC

  18. SoC scaling trends Core vs. periphery • On-chip components scale differently

  19. Chip size TCO analysis • Analysis using small cores

  20. Nanostores: highly integrated building block

  21. Continuing microserver challenges • How to apply to more workloads? • Perhaps accelerators or heterogeneity • How to minimize unnecessary duplication at scale? • Perhaps content-based page sharing, common boot images • How to ensure balanced systems? • Perhaps further disaggregation/composition of resources • How to reduce # of network ports? • Perhaps more intelligent local routing, aggregation

  22. Please see HP’s Project Moonshot!

  23. Thank you! kevin.lim@hp.com • Sheng Li • John Byrne • Laura Ramirez • Alvin AuYoung • Naveen Muralimanohar • Chandrakant Patel • Special thanks to: • Partha Ranganathan • Norm Jouppi • Jichuan Chang • Paolo Faraboschi • Yoshio Turner • Jose Renato Santos

More Related