1 / 35

Design of a Custom VEE Core in a Chip Multiprocessor

Design of a Custom VEE Core in a Chip Multiprocessor. Dan Upton Masters Presentation Oct. 23, 2007. Why a VEE Core?. VEEs have become more common One source of overhead is sharing execution resources (cycles, physical structures). è Move VEE onto a separate core.

bonita
Download Presentation

Design of a Custom VEE Core in a Chip Multiprocessor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design of a Custom VEE Core in a Chip Multiprocessor Dan Upton Masters Presentation Oct. 23, 2007

  2. Why a VEE Core? • VEEs have become more common • One source of overhead is sharing execution resources (cycles, physical structures) è Move VEE onto a separate core

  3. Why A Heterogeneous Core? • VEE different from other applications • Smaller hardware structures can save on power consumption • Better efficiency from CMPs with application-specific cores

  4. What’s To Come • VEE characterization • Performance counter-based • SimpleScalar-based • Power study • SimpleScalar/Wattch-based • Design space

  5. Background: CMPs • Multiple cores on a single die • Shared resources? • Homogeneous vs. heterogeneous CPU CPU CPU CPU L1D$ L1I$ L1D$ L1I$ L1D$ L1I$ L1D$ L1I$ L2 $ L2 $

  6. Background: VEE Types System Process APP APP APP OS APP VEE VEE OS HW HW

  7. System VEEs • Whole system (OS + apps) • Xen, VMWare, Transmeta, … • Hardware support: Intel VT, AMD SVM APP APP OS VEE HW

  8. Process VEEs • Single application per instance • Finer-grained policy selection • Pin, Strata, Dynamo, DynamoRIO, … • Hardware support: this work (based on Pin) APP APP VEE OS HW

  9. An Overview Of Process VEEs Injection Compile and instrument code context switch Code cache OVERHEAD! parallelize with running app? no longer necessary context switch branch target in cache? no yes

  10. Overhead

  11. Environment • Two environments for data collection • Hardware performance-monitoring: fast, gives real data, but can’t modify hardware • Architectural simulation: can modify the simulated hardware, but slow

  12. Environment • Hardware performance counters • perfctr 2.6.25, PAPI 3.5, papiex 0.99rc9 • Xeon with HT, PIII • Modified Pin source to start/stop counters on VEE entry/exit

  13. Environment • Architectural simulation • SimpleScalar-x86 • Allows for modifying architectural characteristics • 8-instruction guest application means most data is representative of VEE

  14. Characterization • Based on architectural units that are commonly considered for removal, resizing, or sharing: • Floating-point pipeline • Cache hierarchy • Branch prediction hardware

  15. Characterization: Floating Point

  16. Characterization: Floating Point • At most .1% floating-point instructions • VEE core probably doesn’t need dedicated FP hardware • Could use conjoined-core approach and share FP with another nearby core on the die

  17. Characterization: L1 Data Cache

  18. Characterization: L1 Data Cache

  19. Characterization: L1 Inst. Cache

  20. Characterization: L1 Inst. Cache

  21. Characterization: Branch Predictor

  22. Characterization: Branch Predictor

  23. Characterization: Branch Predictor

  24. Characterization: Recap • Floating point: low utilization, so the VEE core can share with another core • L1 caches: smaller caches are sufficient • L2 caches: generally shared between cores • Branch predictor: smaller history table is sufficient

  25. Power Consumption • Smaller structures can lead to a decrease in power consumption • Compare power between modern core and our VEE core design using Wattch

  26. Power Savings (per cycle)

  27. Power Savings (overall)

  28. Power Consumption: Summary • Specialized design saves up to 14% power per cycle • Saves up to 5% over the total execution • but it can lead to higher consumption in some cases

  29. Chip-level Design General- Purpose CPU 1 VEE Core General- Purpose CPU VEE core General- Purpose CPU VEE core General- Purpose CPU 2 (other specialized cores) General- Purpose CPU VEE core General- Purpose CPU VEE core Stand-alone VEE core Conjoined VEE core

  30. Support Structures • Communication channel between application and VEE • Support for speculative compilation by the VEE • Channels to peek at application core structures • For instance, branch history for easily profiling hot paths

  31. Related Work • Hardware support for VEEs • Trident • Codesigned VMs • Transmeta, DAISY, Kim & Smith • Java in hardware • picoJava, JOP

  32. Future Work • Multicore simulation to measure interaction between multiple VEE instances • Requires multicore sim framework, multithreaded VEE • Investigate other opportunities arising from separating VEE and application

  33. Conclusions • VEE differs from benchmark applications • VEE-specific core design can save power • Potential for reducing overhead by not sharing execution resources, or parallelizing compilation and execution

  34. Questions?

  35. Characterization: L2 Cache

More Related