360 likes | 530 Views
iGPU : Exception Support and Speculative Execution on GPUs. Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group University of Wisconsin−Madison. Presented at ISCA 2012 . Executive Summary. Compiler/hardware co-design for efficient, general-purpose GPUs
E N D
iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group University of Wisconsin−Madison Presented at ISCA 2012
Executive Summary Compiler/hardware co-design for efficient, general-purpose GPUs Exceptionsupport with 1.5% overhead (no more than 4%) Demand paging support with 2.5% overhead Context switch (no more than 4%) Exploiting speculation provides > 10% energy savings
Outline • Motivation and Background • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation • Conclusion
CPU Evolution Retrospective • IBM 360 era – precise exceptions as a performance tradeoff • However, two key shifts in processor design – • Virtual memory no longer optional • Speculative execution on ILP processors
Precise exception handling and speculation was a key enabler for modern CPUs
GPU Architectural trends A single unified CPU-GPU address space • Significant interest in supporting demand paging • Emerging necessity for supporting speculation • More workloads – “irregular” workloads • Handling reliability problems
Need general purpose exception and speculation support for GPUs
Why not just borrow CPU ideas? • CPUs use buffering to preserve arch. state • Future file, History file, Re-order Buffer … • But GPUs have 1000x as many registers • Not practical!
Fundamental Challenges • Well defined restart point in program • GPU pipeline and SIMT model make this hard • Preserving architecture state prior to restart • Need to save 1000s of registers
Key Ideas of our Solution Creation of restart points Preservation of necessary state • Well defined restart point in program • Idempotent code regions • Restartable regions producing same effect • Preserving architecture state prior to restart • Regions constructed with small live state: 1 to 3 regs • Save only this live state
Outline • Challenges and Implications • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation • Conclusion
Exception Support Creation idea Exception handler B A B Implicit checkpoints using idempotence Idempotent regions mark restart points Register file provides all the reqd. state! Idempotence guarantees correctness
Outline • Challenges and Implications • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation
Context Switch A B ? ? Exception is page fault B Page-fault handling Cleanly remove process 1 ? Start another process and execute Get page from disk concurrently Restore process 1 ? Restart process 1
Context Switch A B ? ? Exception is page fault B Page-fault handling Cleanly remove process 1 ? Start another process and execute Get page from disk concurrently Restore process 1 ? Restart process 1
Context Switch • Must save and restore architectural stateBut...GPUs have megabytes of register state • Save only live state • Save only live state at points of minimal live state
Context Switch Preserve idea Candidate cut point Exception handler B B A B 2 4 9 23 2 # live registers # live registers Implicit minimum live state checkpoints using idempotence • Must save and restore architecture stateBut...GPUs have megabytes of register state • Save only live state • Save state at points of minimal live state
Outline • Challenges and Implications • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation • Conclusion
Speculation Tuning the Creation idea Implicit checkpoints with low re-execution overhead using idempotence • Speculation generates state that is wrong • Need even more buffers • Recall: buffers are impractical for GPUs • Use idempotence! • Reduce re-execution cost by sub-dividing regions
Speculation C C B B1 B2 B2 B A Misspeculation # live registers: 2 * Region construction details: Idempotent Processing, PLDI ‘12
Outline • Motivation and Background • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation • Conclusion
iGPU Architecture Application Compiler Hardware
iGPU Architecture - Software Creation idea Preserve idea region marker instructions register re-assignment, moves and spills region formation state preservation Form regions Preserve state Reg. pressure
iGPU Architecture - Software Kernel Source Code Source Code Compiler Device Code Generator Device Code
iGPU Architecture - Software Kernel Source Code Source Code Compiler Device Code Generator Region formation Idempotent Device Code
iGPU Architecture - Software Kernel Source Code Source Code Compiler Device Code Generator Region formation State preservation Idempotent Device Code
iGPU Architecture - Hardware (not to scale) … L1 cache & TLB Creation idea SIMD Processor L2 Cache RPCs General Purpose Registers … Core Core Fetch Unit Decode … Core Core
iGPU Architecture - Hardware (to scale) General Purpose Registers Restart PC Register 2 RPCs per warp - one each for Sparseand Short regions Compare to 1024 GPRs per warp (32 x 32)
iGPU Architecture - Hardware Preserve idea State preservation handled purely by compiler!Not hardware’s responsibility
Outline • Motivation and Background • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation • Conclusion
Outline • Motivation and Background • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation • Conclusion
Executive Summary Compiler/hardware co-design for efficient, general-purpose GPUs Exceptionsupport with 1.5% overhead (no more than 4%) Demand paging support with 2.5% overhead Context switch (no more than 4%) Exploiting speculation provides > 10% energy savings
Conclusions • Exception support for GPUs is practical • Enables better integration with CPUs in CPU-GPU architectures • Speculative execution on GPUs • Both for performance and reliability • presents interesting possibilities in the context of “irregular” workloads