350 likes | 532 Views
Design of a Custom VEE Core in a Chip Multiprocessor. Dan Upton Masters Presentation Oct. 23, 2007. Why a VEE Core?. VEEs have become more common One source of overhead is sharing execution resources (cycles, physical structures). è Move VEE onto a separate core.
E N D
Design of a Custom VEE Core in a Chip Multiprocessor Dan Upton Masters Presentation Oct. 23, 2007
Why a VEE Core? • VEEs have become more common • One source of overhead is sharing execution resources (cycles, physical structures) è Move VEE onto a separate core
Why A Heterogeneous Core? • VEE different from other applications • Smaller hardware structures can save on power consumption • Better efficiency from CMPs with application-specific cores
What’s To Come • VEE characterization • Performance counter-based • SimpleScalar-based • Power study • SimpleScalar/Wattch-based • Design space
Background: CMPs • Multiple cores on a single die • Shared resources? • Homogeneous vs. heterogeneous CPU CPU CPU CPU L1D$ L1I$ L1D$ L1I$ L1D$ L1I$ L1D$ L1I$ L2 $ L2 $
Background: VEE Types System Process APP APP APP OS APP VEE VEE OS HW HW
System VEEs • Whole system (OS + apps) • Xen, VMWare, Transmeta, … • Hardware support: Intel VT, AMD SVM APP APP OS VEE HW
Process VEEs • Single application per instance • Finer-grained policy selection • Pin, Strata, Dynamo, DynamoRIO, … • Hardware support: this work (based on Pin) APP APP VEE OS HW
An Overview Of Process VEEs Injection Compile and instrument code context switch Code cache OVERHEAD! parallelize with running app? no longer necessary context switch branch target in cache? no yes
Environment • Two environments for data collection • Hardware performance-monitoring: fast, gives real data, but can’t modify hardware • Architectural simulation: can modify the simulated hardware, but slow
Environment • Hardware performance counters • perfctr 2.6.25, PAPI 3.5, papiex 0.99rc9 • Xeon with HT, PIII • Modified Pin source to start/stop counters on VEE entry/exit
Environment • Architectural simulation • SimpleScalar-x86 • Allows for modifying architectural characteristics • 8-instruction guest application means most data is representative of VEE
Characterization • Based on architectural units that are commonly considered for removal, resizing, or sharing: • Floating-point pipeline • Cache hierarchy • Branch prediction hardware
Characterization: Floating Point • At most .1% floating-point instructions • VEE core probably doesn’t need dedicated FP hardware • Could use conjoined-core approach and share FP with another nearby core on the die
Characterization: Recap • Floating point: low utilization, so the VEE core can share with another core • L1 caches: smaller caches are sufficient • L2 caches: generally shared between cores • Branch predictor: smaller history table is sufficient
Power Consumption • Smaller structures can lead to a decrease in power consumption • Compare power between modern core and our VEE core design using Wattch
Power Consumption: Summary • Specialized design saves up to 14% power per cycle • Saves up to 5% over the total execution • but it can lead to higher consumption in some cases
Chip-level Design General- Purpose CPU 1 VEE Core General- Purpose CPU VEE core General- Purpose CPU VEE core General- Purpose CPU 2 (other specialized cores) General- Purpose CPU VEE core General- Purpose CPU VEE core Stand-alone VEE core Conjoined VEE core
Support Structures • Communication channel between application and VEE • Support for speculative compilation by the VEE • Channels to peek at application core structures • For instance, branch history for easily profiling hot paths
Related Work • Hardware support for VEEs • Trident • Codesigned VMs • Transmeta, DAISY, Kim & Smith • Java in hardware • picoJava, JOP
Future Work • Multicore simulation to measure interaction between multiple VEE instances • Requires multicore sim framework, multithreaded VEE • Investigate other opportunities arising from separating VEE and application
Conclusions • VEE differs from benchmark applications • VEE-specific core design can save power • Potential for reducing overhead by not sharing execution resources, or parallelizing compilation and execution