iGPU : Exception Support and Speculative Execution on GPUs

iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group University of Wisconsin−Madison Presented at ISCA 2012

Executive Summary Compiler/hardware co-design for efficient, general-purpose GPUs Exceptionsupport with 1.5% overhead (no more than 4%) Demand paging support with 2.5% overhead Context switch (no more than 4%) Exploiting speculation provides > 10% energy savings

Outline • Motivation and Background • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation • Conclusion

CPU Evolution Retrospective • IBM 360 era – precise exceptions as a performance tradeoff • However, two key shifts in processor design – • Virtual memory no longer optional • Speculative execution on ILP processors

Precise exception handling and speculation was a key enabler for modern CPUs

GPU Architectural trends A single unified CPU-GPU address space • Significant interest in supporting demand paging • Emerging necessity for supporting speculation • More workloads – “irregular” workloads • Handling reliability problems

Need general purpose exception and speculation support for GPUs

Why not just borrow CPU ideas? • CPUs use buffering to preserve arch. state • Future file, History file, Re-order Buffer … • But GPUs have 1000x as many registers • Not practical!

Fundamental Challenges • Well defined restart point in program • GPU pipeline and SIMT model make this hard • Preserving architecture state prior to restart • Need to save 1000s of registers

Key Ideas of our Solution Creation of restart points Preservation of necessary state • Well defined restart point in program • Idempotent code regions • Restartable regions producing same effect • Preserving architecture state prior to restart • Regions constructed with small live state: 1 to 3 regs • Save only this live state

Outline • Challenges and Implications • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation • Conclusion

Exception Support Creation idea Exception handler B A B Implicit checkpoints using idempotence Idempotent regions mark restart points Register file provides all the reqd. state! Idempotence guarantees correctness

Outline • Challenges and Implications • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation

Context Switch A B ? ? Exception is page fault  B  Page-fault handling Cleanly remove process 1 ? Start another process and execute  Get page from disk concurrently  Restore process 1 ? Restart process 1 

Context Switch • Must save and restore architectural stateBut...GPUs have megabytes of register state • Save only live state • Save only live state at points of minimal live state

Context Switch Preserve idea Candidate cut point Exception handler B B A B 2 4 9 23 2 # live registers # live registers Implicit minimum live state checkpoints using idempotence • Must save and restore architecture stateBut...GPUs have megabytes of register state • Save only live state • Save state at points of minimal live state

Outline • Challenges and Implications • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation • Conclusion

Speculation Tuning the Creation idea Implicit checkpoints with low re-execution overhead using idempotence • Speculation generates state that is wrong • Need even more buffers • Recall: buffers are impractical for GPUs • Use idempotence! • Reduce re-execution cost by sub-dividing regions

Speculation C C B B1 B2 B2 B A Misspeculation # live registers: 2 * Region construction details: Idempotent Processing, PLDI ‘12

iGPU Architecture Application Compiler Hardware

iGPU Architecture - Software Creation idea Preserve idea region marker instructions register re-assignment, moves and spills region formation state preservation Form regions Preserve state Reg. pressure

iGPU Architecture - Software Kernel Source Code Source Code Compiler Device Code Generator Device Code

iGPU Architecture - Software Kernel Source Code Source Code Compiler Device Code Generator Region formation Idempotent Device Code

iGPU Architecture - Software Kernel Source Code Source Code Compiler Device Code Generator Region formation State preservation Idempotent Device Code

iGPU Architecture - Hardware (not to scale) … L1 cache & TLB Creation idea SIMD Processor L2 Cache RPCs General Purpose Registers … Core Core Fetch Unit Decode … Core Core

iGPU Architecture - Hardware (to scale) General Purpose Registers Restart PC Register 2 RPCs per warp - one each for Sparseand Short regions Compare to 1024 GPRs per warp (32 x 32)

iGPU Architecture - Hardware Preserve idea State preservation handled purely by compiler!Not hardware’s responsibility

Evaluation

Evaluation – Voltage Speculation

Executive Summary Compiler/hardware co-design for efficient, general-purpose GPUs Exceptionsupport with 1.5% overhead (no more than 4%) Demand paging support with 2.5% overhead Context switch (no more than 4%) Exploiting speculation provides > 10% energy savings

Conclusions • Exception support for GPUs is practical • Enables better integration with CPUs in CPU-GPU architectures • Speculative execution on GPUs • Both for performance and reliability • presents interesting possibilities in the context of “irregular” workloads

Questions

iGPU : Exception Support and Speculative Execution on GPUs

iGPU : Exception Support and Speculative Execution on GPUs

Presentation Transcript

Lazy and Speculative Execution

Red Fox: An Execution Environment for Relational Query Processing on GPUs

List Ranking on GPUs

OpenCL Compiler Support Based on Open64 for MPUs+GPUs

Red Fox: An Execution Environment for Data Warehousing Applications on GPUs

Paragon: Collaborative Speculative Loop Execution on GPU and CPU

Speculative Execution In Distributed File System and External Synchrony

SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM

Speculative Execution in a Distributed File System

A Configurable Simulator for OOO Speculative Execution

Physical Simulation on GPUs

Software Pipelined Execution of Stream Programs on GPUs

Mixed Speculative Multithreaded Execution Models

Processor Verification with Precise Exceptions and Speculative Execution

Speculative Execution in a Distributed File System

Out-of-Order Speculative Execution

Lecture 7 : Speculative Execution and Recovery using Reorder Buffer

Lecture 7 : Speculative Execution and Recovery

Weak Execution Ordering - Exploiting Iterative Methods on Many-Core GPUs

Lazy and Speculative Execution

Out-of-Order Speculative Execution