Architecting Warehouse Scale Computers for On-line Data Intensive Services

Architecting Warehouse Scale Computersfor On-line Data Intensive Services Thomas F. Wenisch University of Michigan NSF SEEDM Workshop 2 May 2011

Power: A first-class design constraint in Warehouse Scale Computers (WSCs) By 2011, 2.5% of US energy $7.4 billion/yr. Installed base grows 11%/yr. Lifetime Cost of a Data Center Source: US EPA 2007 Annual data center CO2: 17 million households Source: Barroso ‘10 Peak power determinesdata center capital costs Source: Mankoff et al, IEEE Computer 2008 Improving both energy & capitalefficiency is critical

Implications of Current Scaling Trends on Active & Idle Low-Power Modes PowerNap - Idle Low-Power Mode DVFS - Active Low-Power Mode [Cho ’10] Existing low-power modes becoming less effective!

Challenge 1: Energy-efficient Platforms for Online Data Intensive (OLDI) Services • Our term for new breed of interactive web services • Process TBs of data with ms request latency • Tail latencies critical (e.g., 95th, 99th-percentile latency) • Examples:web search, machine translation, online ads • Challenging workload class for energy management • Provisioned by index size and latency not throughput • Massive memory footprints limit intra-cluster replication • Scaling cluster size with load is generally inapplicable

Preliminary Work: Case study of Google web search [ISCA’11] Key Findings: • Existing CPU idle power modes react fast enough …but may become less effective with future scaling • Memory bandwidth frequently under-utilized...large opportunity for memory active low-power mode • Need coordinated full-system active low-power modes ...must maintain system balance while scaling down capacity

Ongoing work:Efficient design for memcached workloads • Trend: massive networked DRAM clusters • 20TB, 800 servers @ Facebook • Flickr – caches last 24-hr uploads • “Memory is the new disk” (e.g., RAMCloud) • Use memcached as a case study • Behavior varies drastically with avg. object size • Server-level: Wimpy vs. Beefy Cores? Accelerators? • Cluster-level: Balance capacity, throughput, latency? How to reason about design for this new workload?

Challenge 2: Scalable WSC Evaluation Methodology Can’t have data centers under our desks… …how can academics do WSC research? • Need to foster community resources • Traces from production facilities • Benchmarks of internet service workloads • Peer-reviewed models of app. behavior • Facility-scale simulation tools • Shared testbeds (e.g., PlanetLab) ExascaleEvaluation and Research Techniques Workshop Third edition planned for next ASPLOS

Preliminary Work: Stochastic Queuing Simulation [EXERT '10] • Traditional architectural simulations do not scale • Need to model WSC systems at higher abstraction SQS integrates workloads, power-performance and queuing models that are otherwise analytically intractable (e.g. G/G/k)

Architecting Warehouse Scale Computers for On-line Data Intensive Services