Microservers and beyond: Pushing the boundaries of efficiency

Microservers and beyond: Pushing the boundaries of efficiency Kevin Lim, Research Scientist HP Labs, Intelligent Infrastructure Lab

Embracing scale-out New Data Center Construction Costs* • Now in scale-out computing era • Datacenters with 100,000’s servers • Billions of devices • ZB’s of data moved and processed • Must alwaysdesign for scale • Microservers can provide significant efficiency benefits • Total cost of ownership at datacenter level • Metrics: Performance / W, Performance / $ $65B $30B • *IDC 2011: Market Analysis Perspective: Worldwide Datacenter Trends and Strategies 2010

HP Labs and Microservers • Have long viewed microservers as critical building block • Multiple research publications since 2008 • When do they make sense? How to design servers? • Will cover three main works: • µBlades: Initial microserver exploration • Disaggregated memory: Addressing capacity needs • SoC architectures for hyperscale servers: System-level integration • And push microservers to an extreme: nanostores

µBlades for the cloud • Focus on internet sector: fastest growing server market • Google, Amazon, MSN’s billion dollar data centers • Extreme scale: Millions of users on 100,000s of servers • Infrastructure and power/cooling some of largest expenses • Needs for system architecture research • Cloud application benchmarking • Whole-system costs analysis • Holistic system designs with compelling performance/cost

Benchmarks and Metrics • New benchmark suite • Websearch • Unstructured data, large data sets • Webmail • User involvement, scale-out applications • Video sharing • Rich media, large streaming data • MapReduce • Cloud computing, internet as a platform • Key metric: Sustained performance / Total cost of ownership (Perf / TCO-$)

Power & Cooling Hardware Cost Analysis of Baseline Servers Holistic approach must be taken to reduce costs

µBlades: attack the inefficiencies (3) Remote shared memory blades and flash-based disk caching (1) Power-efficient, embedded-class low-cost processors (2) Compact packaging, aggregate cooling, enclosure optimizations

µBlades: performance/TCO-$ 2.0X perf/TCO Relative to cost-optimized baseline

Microservers and the memory capacity wall 30% less GB/core every 2 years • Very small physical space • Making things worse: • DRAM scaling slowing • Less modules close to cores • Non-linear $/GB curve • Growing need for memory: • Many-core scaling • Workload consolidation • In-memory DB/BI, DISC • Interactive web-scale workloads • Performance/cost impact $/GB (June 2008)

Opportunity: optimizing for the ensemble Intra-server variation (TPC-H, log scale) Inter-server variation (rendering farm) • Use same concepts as microblades/servers! • Dynamic provisioning across ensemble enables cost & power savings Time

Disaggregated memory contributions Goal: Expand capacity & provision for typical usage • New architectural building block: memory blade • Breaks traditional compute-memory co-location • Architectures for transparent memory expansion • Capacity expansion: • 8x performance over provisioning for median usage • Higher consolidation • Capacity sharing: • Lower power and costs • Better performance / dollar

General memory blade design Perf.: Accessed as memory, not swap space Commodity: Connected via PCIe or HT Memory blade (enlarged) Backplane CPUs CPUs CPUs CPUs Protocol engine DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM Memory controller Address mapping Cost: Handles dynamic memory partitioning Transparency: Enforces allocation, isolation, and mapping Cost: Leverage sweet-spot of RAM pricing Other optimizations Break CPU-memory co-location Leverage fast, shared communication fabrics

Disaggregated memory results Baseline: Mem_median Server consolidation 8X 60% 2X • Trace-based simulation (TPC, search/index, SpecJBB, SpecCPU) • Performance: 8X for memory-limited workloads, slightly worse than ideal • Consolidation: 60% server reduction (web 2.0 company traces) 13

Case for integration: Server inefficiencies today xeon SB chipset NB chipset dimm A typical 2-socket blade server • Three-chip chipset (PCIe: 27W, IO: 7W) • CPU + chipset: > 200W (and ~$2k) • General purpose, un-optimized parts dimm dimm dimm xeon disk NIC $ ser/des

Why slow adoption of server SoCs? • Aggressive SoCs are largely absent in general purpose CPUs • Mix & match one CPU chip with different peripherals • Different development timescales for CPU & peripherals • Integration trend: slower than embedded, but steady over time (and accelerating) • Today: mounting cost & energy pressure in datacenters demand technologies that first and foremost can decrease TCO • Energy  fewer pin crossings, more efficient blocks (heterogeneous) • Cost  fewer sockets, BoM, etc. • Creating a stronger case for SoC-based servers

System-level integration for hyperscale servers • Goal: Quantify System-level Integration benefits • Energy reduction: remove expensive pin crossings to the I/O subsystem • Cost reduction: use of silicon density, riding the mobile volume commodity curve • For example, a 100 mm2 SoC at 16nm vs. the best non-SoC: • Reduces total silicon area by ~40%, dynamic chip power by ~30% • Reduces datacenter-level cost by ~20%, energy by ~35%, TCO by ~25% • TCO reduction range15-40%, >50% with network port aggregation

Design space exploration complexity • Huge Design Space and Multilevel Design Point Chip Level: architecture, core, SoC components, resource provisioning, chip cost Board and Server Level: Socket planning, Inter-chip aggregation Datacenter Level: Intra-rack aggregation • Workload Variability Computation intensive vs. I/O intensive (Memory, Disk, Network) • Significant development of tools • Extended McPAT, paired with TCO models Partially integrated SoC

SoC scaling trends Core vs. periphery • On-chip components scale differently

Chip size TCO analysis • Analysis using small cores

Nanostores: highly integrated building block

Continuing microserver challenges • How to apply to more workloads? • Perhaps accelerators or heterogeneity • How to minimize unnecessary duplication at scale? • Perhaps content-based page sharing, common boot images • How to ensure balanced systems? • Perhaps further disaggregation/composition of resources • How to reduce # of network ports? • Perhaps more intelligent local routing, aggregation

Please see HP’s Project Moonshot!

Thank you! kevin.lim@hp.com • Sheng Li • John Byrne • Laura Ramirez • Alvin AuYoung • Naveen Muralimanohar • Chandrakant Patel • Special thanks to: • Partha Ranganathan • Norm Jouppi • Jichuan Chang • Paolo Faraboschi • Yoshio Turner • Jose Renato Santos

Microservers and beyond: Pushing the boundaries of efficiency

Microservers and beyond: Pushing the boundaries of efficiency

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7