Virtual Hierarchies to Support Server Consolidation

Virtual Hierarchies to Support Server Consolidation Michael Marty and Mark Hill University of Wisconsin - Madison

What is Server Consolidation? • Multiple server applications are deployed onto Virtual Machines (VMs), running on a single, more powerful server. • Feasibility • Virtualization Technology (VT) – Hardware and software • Many-core CMPs – Sun’s Niagara (32 threads); Intel’s Tera-scale project (100s tiles)

CMP Running Consolidated Servers

Characteristics • Isolating the function of VMs • Isolating the performance of consolidated servers • Facilitating dynamic reassignment of VM resources (processor, memory) • Supporting inter-VM memory sharing (content-based page sharing)

How Memory System Optimized? • Minimize AMAT by servicing misses within a VM • Minimize interference among separate VMs to isolate performance • Facilitate dynamic reassignment of cores, caches, and memory to VMs • Inter-VM page sharing

Current CMP Memory Systems • Global broadcast – Not viable for such a large number of tiles • Global directory – Forcing memory accesses to cross chip, failing to minimize AMAT and isolate performance • Statically distributing dir among tiles – Better, complicating memory allocation, VM reassignment & scheduling, limiting sharing opportunity

DRAM Dir with Dir Cache (DRAM-DIR) • Main dir in DRAM; Dir cache in Memory Controller • Each tile is a sharer of the data • Any miss issues a request to dir. • 1. Failing to minimize AMAT • Significant latency to reach dir, even data is near • 2. Allows performance of one VM to affect others • due to interconnect and directory contention.

Duplicate Tag Directory (TAG-DIR) • Centrally located • Fails to minimize AMAT • Dir contentions • Challenging as the number of cores increases (64 cores, 16-way => 1024-way)

Static Cache Bank Dir(STATIC-BANK-DIR) • Home tile (decided by block address or page frame no.) • Home tile maintains sharer & states • A local miss asks for home tile • A replacement from home tile invalidates all copies • Fails to meet minimizing AMAT, VM isolation (Even worse, due to invalidations.)

Solution: Two-level virtual hierarchy • Level 1 directory for intra-VM coherence • Minimizing memory access time • Isolating performance • Two alternative global level two protocols for inter-VM coherence • Allowing for inter-VM sharing due to migration, reconfiguration, page sharing • VHA and VHB

Level 1 Intra-VM Dir Protocol • Home tile within the VM • Who is home? • Not necessarily power of 2 • Dynamic reassignment • Dynamic home tiles by VM config Table (64-entry) • 64 bit vector for each dir entry

Level 2 – Option 1: VHA • Dir in DRAM and Dir Cache in Memory Controller • Each entry contains a full 64-bit vector • Why not home tile ID?

Brief Summary • Level-one Intra-VM protocol handles most of the coherence • Level-two protocol will only be used for inter-VM sharing and dynamic reconfiguration of VMs • Can we reduce the complexity of Level-two protocol?

Level 2 – Option 2: VHB • A single bit tracks whether a block has any cached copies. • Broadcast for misses for inter-VM sharing if bit is set.

Advantage of Level 2 Broadcast • Reduce the complexity of protocol, get rid of many transient states • Enables level 1 proto to be inexact • Using limited or coarse-grain vector • Even no state with broadcast within VM • No home tag for private data • Victimize a tag without invalidating sharers • Accessing memory with prediction without checking the home tile first

Uncontended L1-to-L1 Sharing latency

Normalized Runtime: Homogenous • STATIC-BANK-DIR & VHA consumes tag space in static or dynamic home tiles • VHB: no home tiles for private data

Memory System Stall Cycle

Cycle per Transaction for Mixed • VHB best overall performance, lowest cpt • DRAM-DIR: 45%-55% hit rate in the 8MB Dir Cache (no partition) • STATIC: slightly better for oltp, worse for jbb in mixed1, allow interference, allow oltp to use other VMs resource

Conclusion • Future memory system should be optimized for workload consolidation as well as single-workload. • Maximize shared memory accesses serviced within a VM • Minimize interference among separate VMs • Facilitate dynamic reassignment of resource

Virtual Hierarchies to Support Server Consolidation