360 likes | 378 Views
Virtual Hierarchies to Support Server Consolidation. Mike Marty Mark Hill University of Wisconsin-Madison ISCA 2007. 64-core CMP. L2 Cache. Core. L1. Motivation: Server Consolidation. www server. database server #2. database server #1. middleware server #1. middleware server #1.
E N D
Virtual Hierarchies to Support Server Consolidation Mike Marty Mark Hill University of Wisconsin-Madison ISCA 2007
64-core CMP L2 Cache Core L1 Motivation: Server Consolidation www server database server #2 database server #1 middleware server #1 middleware server #1
64-core CMP Motivation: Server Consolidation www server database server #2 database server #1 middleware server #1 middleware server #1
64-core CMP Motivation: Server Consolidation www server database server #2 data database server #1 middleware server #1 data middleware server #1 Optimize Performance
64-core CMP Motivation: Server Consolidation www server database server #2 database server #1 middleware server #1 middleware server #1 Isolate Performance
Motivation: Server Consolidation www server 64-core CMP database server #2 database server #1 middleware server #1 middleware server #1 Dynamic Partitioning
64-core CMP Motivation: Server Consolidation www server database server #2 data database server #1 VMWare’s Content-based Page Sharing Up to 60% reduced memory middleware server #1 middleware server #1 Inter-VM Sharing
Executive Summary • Motivation: Server Consolidation • Many-core CMPs increase opportunities • Goals of Memory System: • Performance • Performance Isolation between VMs • Dynamic Partitioning (VM Reassignment) • Support Inter-VM Sharing • Hypervisor/OS Simplicity • Proposed Solution: Virtual Hierarchy • Overlay 2-level hierarchy on physically-flat CMP • Harmonize with VM Assignment
Outline • Motivation • Server consolidation • Memory system goals • Non-hierarchical approaches • Virtual Hierarchies • Evaluation • Conclusion
duplicate tag directory TAG-DIRECTORY A Read A
duplicate tag directory TAG-DIRECTORY A A Read A fwd 3 data 1 2 getM A duplicate tag directory
STATIC-BANK-DIRECTORY A Read A data 3 1 fwd 2 getM A A
STATIC-BANK-DIRECTORY with hypervisor-managed cache 2 A fwd getM A A Read A 1 data 3
Optimize Performance Isolate Performance Allow Dynamic Partitioning Support Inter-VM Sharing Hypervisor/OS Simplicity Goals STATIC-BANK-DIRECTORY w/ hypervisor-managed cache STATIC-BANK-DIRECTORY TAG-DIRECTORY • No • No • Yes • Yes • Yes • Yes • Yes • ? • Yes • No • No • No • Yes • Yes • Yes
Outline • Motivation • Virtual Hierarchies • Evaluation • Conclusion
Virtual Hierarchies • Key Idea: Overlay 2-level Coherence Hierarchy on CMP • - First level harmonizes with VM/Workload • - Second level allows inter-VM sharing, migration, reconfig
VH: First-Level Protocol • Intra-VM Directory Protocol w/ interleaved directories • Questions: • How to name directories? • How to name sharers? • Dynamichome tile selected by VM Config Table • Hardware VM Config Table at each tile • Set by hypervisor during scheduling • Full bit-vector to track any possible sharer • Intra-VM broadcast also possible getM INV
L2 Cache Core L1 p12 p13 p14 0 p12 1 p13 2 p14 Address 3 p12 ……000101 offset 4 p13 Home Tile: p14 5 p14 6 63 p12 VH: First-Level Protocol • Example: • Hypervisor/OS can freely change VM Config Table • No cache flushes • No atomic updates • No explicit movement of directory state per-Tile VM Config Table Dynamic
Virtual Hierarchies Two Solutions for Global Coherence: VHA and VHB memory controller(s)
Protocol VHA • Directory as Second-level Protocol • Any tile can act as first-level directory • How to track and name first-level directories? • Full bit-vector of sharers to name any tile • State stored in DRAM • Possibly cache on-chip • + Maximum flexibility • - DRAM State • - Complexity
VHA Example 2 A getM A 6 data getM A 1 data 3 5 directory/memory controller Fwd A 4 Fwd data A
Protocol VHB • Broadcast as Second-level Protocol • Attach token count for each block [token coherence] • T tokens for each block. One token to read, all to write • Allows 1-bit at memory per block • Eliminates system-wide ACK • + Minimal DRAM State • + Enables easier optimizations • - Global coherence requires more activity
VHB Example 2 A getM A getM A 1 memory controller global getM A 3 Data+tokens 4 A
Optimize Performance Isolate Performance Allow Dynamic Partitioning Support Inter-VM Sharing Hypervisor/OS Simplicity Memory System Goals VHA and VHB • Yes • Yes • Yes • Yes • Yes
Outline • Motivation • Virtual Hierarchies • Evaluation • Conclusion
Evaluation: Methods • Wisconsin GEMS • Full-system, execution-driven simulation • Based on Virtutech Simics • http://www.cs.wisc.edu/gems • 64-core tiled CMP • In-order SPARC cores • 512 KB, 16-way L2 cache per tile • 2D mesh interconnect, 16-byte links, 5-cycle link latency • Four on-chip memory controllers, 275-cycle DRAM latency
Evaluation: Methods • Workloads: OLTP, SpecJBB, Apache, Zeus • Separate instance of Solaris for each VM • Approximating Virtualization • Multiple Simics checkpoints interleaved onto CMP • Assume workloads map to adjacent cores • Bottomline: No hypervisor simulated
Evaluation: Protocols • TAG-DIRECTORY: • 3-cycle central tag directory (1024 ways!) • STATIC-BANK-DIRECTORY • Home tiles interleaved by frame address • VHA • All data allocates in L2 bank of dynamic home tile • VHB • Unshared data always allocates in local L2 • All Protocols: one L2 copy of block on CMP
Result: Runtime Eight VMs x Eight Cores Each = 64 Cores (e.g. eight instances of Apache) 1.4 1.2 1 0.8 Normalized Runtime 0.6 0.4 0.2 0 VHA VHB VHA VHB VHA VHB VHA VHB TAG-DIR TAG-DIR TAG-DIR TAG-DIR STATIC-BANK-DIR STATIC-BANK-DIR STATIC-BANK-DIR STATIC-BANK-DIR Zeus SpecJBB Apache OLTP
Result: Memory Stall Cycles Eight VMs x Eight Cores Each = 64 cores (e.g. eight instances of Apache) VHA VHB VHA VHB VHA VHB VHA VHB TAG-DIR TAG-DIR TAG-DIR TAG-DIR STATIC-BANK-DIR STATIC-BANK-DIR STATIC-BANK-DIR STATIC-BANK-DIR SpecJBB OLTP Apache Zeus
Executive Summary • Server Consolidation an Emerging Workload • Goals of Memory System: • Performance • Performance Isolation between VMs • Dynamic Partitioning (VM Reassignment) • Support Inter-VM Sharing • Hypervisor/OS Simplicity • Proposed Solution: Virtual Hierarchy • Overlay 2-level hierarchy on physically-flat CMP • Harmonize with Workload Assignment
Omitting 2nd-Level Coherence: Protocol VH0 • Impacts: • Dynamic Partitioning • Inter-VM Sharing (VMWare’s Content-based Page Sharing) • Hypervisor/OS complexity • Example: Steps for VM Migration from Tiles {M} to {N} • Stop all threads on {M} • Flush {M} caches • Update {N} VM Config Tables • Start threads on {N}
Omitting 2nd-Level Coherence: Protocol VH0 • Example: Inter-VM Content-based Page Sharing • Up to 60% reduced memory demand • Is read-only sharing possible with VH0? • VMWare’s Implementation: • Global hash table to store hashes of pages • Guest pages scanned by VMM, hashes computed • Full comparison of pages on hash match • Potential VH0 Implementation: • How does hypervisor scan guest pages? Are they modified in cache? • Even read-only pages must initially be written at some point
P P P P P P P P Shared L2 Shared L2 L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ Shared L2 Shared L2 L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ P P P P P P P P Physical Hierarchy / Clusters
P P P P P P P P L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ P P P P P P P P Physical Hierarchy / Clusters middleware server #1 www server Shared L2 Shared L2 database server #1 Shared L2 Shared L2