210 likes | 390 Views
Virtual Hierarchies to Support Server Consolidation. Michael Marty and Mark Hill University of Wisconsin - Madison. What is Server Consolidation ?. Multiple server applications are deployed onto Virtual Machines (VMs), running on a single, more powerful server. Feasibility
E N D
Virtual Hierarchies to Support Server Consolidation Michael Marty and Mark Hill University of Wisconsin - Madison
What is Server Consolidation? • Multiple server applications are deployed onto Virtual Machines (VMs), running on a single, more powerful server. • Feasibility • Virtualization Technology (VT) – Hardware and software • Many-core CMPs – Sun’s Niagara (32 threads); Intel’s Tera-scale project (100s tiles)
Characteristics • Isolating the function of VMs • Isolating the performance of consolidated servers • Facilitating dynamic reassignment of VM resources (processor, memory) • Supporting inter-VM memory sharing (content-based page sharing)
How Memory System Optimized? • Minimize AMAT by servicing misses within a VM • Minimize interference among separate VMs to isolate performance • Facilitate dynamic reassignment of cores, caches, and memory to VMs • Inter-VM page sharing
Current CMP Memory Systems • Global broadcast – Not viable for such a large number of tiles • Global directory – Forcing memory accesses to cross chip, failing to minimize AMAT and isolate performance • Statically distributing dir among tiles – Better, complicating memory allocation, VM reassignment & scheduling, limiting sharing opportunity
DRAM Dir with Dir Cache (DRAM-DIR) • Main dir in DRAM; Dir cache in Memory Controller • Each tile is a sharer of the data • Any miss issues a request to dir. • 1. Failing to minimize AMAT • Significant latency to reach dir, even data is near • 2. Allows performance of one VM to affect others • due to interconnect and directory contention.
Duplicate Tag Directory (TAG-DIR) • Centrally located • Fails to minimize AMAT • Dir contentions • Challenging as the number of cores increases (64 cores, 16-way => 1024-way)
Static Cache Bank Dir(STATIC-BANK-DIR) • Home tile (decided by block address or page frame no.) • Home tile maintains sharer & states • A local miss asks for home tile • A replacement from home tile invalidates all copies • Fails to meet minimizing AMAT, VM isolation (Even worse, due to invalidations.)
Solution: Two-level virtual hierarchy • Level 1 directory for intra-VM coherence • Minimizing memory access time • Isolating performance • Two alternative global level two protocols for inter-VM coherence • Allowing for inter-VM sharing due to migration, reconfiguration, page sharing • VHA and VHB
Level 1 Intra-VM Dir Protocol • Home tile within the VM • Who is home? • Not necessarily power of 2 • Dynamic reassignment • Dynamic home tiles by VM config Table (64-entry) • 64 bit vector for each dir entry
Level 2 – Option 1: VHA • Dir in DRAM and Dir Cache in Memory Controller • Each entry contains a full 64-bit vector • Why not home tile ID?
Brief Summary • Level-one Intra-VM protocol handles most of the coherence • Level-two protocol will only be used for inter-VM sharing and dynamic reconfiguration of VMs • Can we reduce the complexity of Level-two protocol?
Level 2 – Option 2: VHB • A single bit tracks whether a block has any cached copies. • Broadcast for misses for inter-VM sharing if bit is set.
Advantage of Level 2 Broadcast • Reduce the complexity of protocol, get rid of many transient states • Enables level 1 proto to be inexact • Using limited or coarse-grain vector • Even no state with broadcast within VM • No home tag for private data • Victimize a tag without invalidating sharers • Accessing memory with prediction without checking the home tile first
Normalized Runtime: Homogenous • STATIC-BANK-DIR & VHA consumes tag space in static or dynamic home tiles • VHB: no home tiles for private data
Cycle per Transaction for Mixed • VHB best overall performance, lowest cpt • DRAM-DIR: 45%-55% hit rate in the 8MB Dir Cache (no partition) • STATIC: slightly better for oltp, worse for jbb in mixed1, allow interference, allow oltp to use other VMs resource
Conclusion • Future memory system should be optimized for workload consolidation as well as single-workload. • Maximize shared memory accesses serviced within a VM • Minimize interference among separate VMs • Facilitate dynamic reassignment of resource