230 likes | 348 Views
Virtual Private Caches. ISCA’07 Kyle J. Nesbit, James Laudon, James E. Smith Presenter: Yan Li. CMP-based System. Chip-level Multiprocessor multiple processor cores are implemented into a single chip Multithreading support. Intel Core 2 Duo E6750. CMP-based System (2). Resource sharing
E N D
Virtual Private Caches ISCA’07 Kyle J. Nesbit, James Laudon, James E. Smith Presenter: Yan Li
CMP-based System • Chip-level Multiprocessor • multiple processor cores are implemented into a single chip • Multithreading support Intel Core 2 Duo E6750
CMP-based System (2) • Resource sharing • Cache capacity/bandwidth, main memory…… • Pros: Higher resource utilization • Cons:Inter-thread interference • Unpredictable performance / no QoS! • Many applications running on CMP-based systems require Quality of Service
Quality of Service • QoS are required by many applications: • Soft real-time applications • video games • Find-grain parallel applications • Scheduling & synchronization • Server consolidation • Hosting services • QoS objectives in CMP-based system • provide an upper bound on thread execution time regardless of other thread activity
Outline • Introduction • QoS Framework • Virtual Private Cache - VPC Arbiter • Virtual Private Cache - Capacity Manager • Performance Evaluation • Conclusions
Overview of VPM • Virtual Private Machine: A set of allocated hardware resources • Processors, bandwidth, memory spaces… • Each thread is allocated a share of hardware resource based on policies • Applications & system software • Hardware mechanism enforces allocated resources
System hardware VPM
Objectives of VPM • Performance Isolation • thread performance is as good as on real private machine having same resources • Dynamic distribution of excess resources • Unallocated resources • Allocated but not used resources
Virtual Private Cache • Microarchitecture-level mechanism • Main components • VPC Arbiter: tag & data array bandwidth sharing • VPC Capacity Manager: cache capacity sharing • Advantages • Performance isolation • Improved utilization
Outline • Introduction • QoS Framework • Virtual Private Cache - VPC Arbiter • Virtual Private Cache - Capacity Manager • Performance Evaluation • Conclusions
VPC Arbiter - Implementation(1) • Each data & tag array has an arbiter • Each arbiter has • FIFO buffer for each thread: • 1 clock register R.clk: determine arrival time • R.Li & R.Si for thread i: virtual service/start time
VPC Arbiter - Implementation(2) • R.Li: virtual service time of a request from thread i • L: latency of shared cache; : thread i’s fraction of resources • R.Si: virtual start time of the next request of thread i • Time that the resource is available for the next request of thread i
Fair Queuing Scheduling • Request Arrival: • Arbiter Calculation of virtual finish time: • Arbiter Selection: • select the request with the earliest Fi
Arbiter Fairness Policy • Excess bandwidth is distributed to threads that has received the least excess bandwidth in the past
Outline • Introduction • QoS Framework • Virtual Private Cache - VPC Arbiter • Virtual Private Cache - Capacity Manager • Performance Evaluation • Conclusions
Implementation • Set associative replacement policy • Each thread receives • same number of sets as the shared cache • at least <ways in the shared cache> • Replacement policy • LRU line owned by thread i, such that thread i owns more than ways • LRU line owned by the thread that requesting the replacement
Outline • Introduction • QoS Framework • Virtual Private Cache - VPC Arbiter • Virtual Private Cache - Capacity Manager • Performance Evaluation • Conclusions
Experiment Setup • Two microbenchmarks to stress performance isolation feature • Loads: load operations with continuous read hits • Stores: store operations with continuous write hits • SPEC CPU2000 benchmark suite • QoS performance metrics • IPC • Data array utilization
Other Arbiter • Read over Write • Prioritize read over write • Read over Write First Come First Service • Prioritize read over write • Prioritize oldest requests • Round Robin • Interleave requests uniformly and consistently
Conclusions • VPC: hardware mechanism of VPM QoS framework • VPC arbiter & capacity manager • VPC can achieve global QoS objectives • Issues: • Local QoS objectives assumes performance monotonicity
Thank You! & Questions?