470 likes | 1.52k Views
EPIC Architecture (Explicitly Parallel Instruction Computing). Yangyang Wen CDA5160--Advanced Computer Architecture I University of Central Florida. Outline. What is EPIC? EPIC Philosophy Architectural Features Supporting EPIC Intel’s IA-64 Architectural Features IA-64’s Key Technologies
E N D
EPIC Architecture(Explicitly Parallel Instruction Computing) Yangyang Wen CDA5160--Advanced Computer Architecture I University of Central Florida
Outline • What is EPIC? • EPIC Philosophy • Architectural Features Supporting EPIC • Intel’s IA-64 Architectural Features • IA-64’s Key Technologies • Summary and Reference
Original Source Code Sequential Machine Code Hardware Compiler parallelized code multiple functional units Execution Units Available the execution units are not used efficiently Traditional Architectures: Limited Parallelism Today’s Processors often 60% Idle
Original Source Code Compile Compiler Hardware multiple functional units EPIC Compiler Views Wider Scope Get more efficient use of execution resources . . . . . . . . . . . . EPIC Architecture: Explicit Parallelism Better Parallel machine Code Increases Parallel Execution
What is EPIC ? EPIC means Explicitly Parallel Instruction computing, and EPIC architecture provides features that allow compilers to take a proactive role in enhancing Instruction level parallelism( ILP) without unacceptable hardware complexity.
EPIC Design Philosophy • EPIC permits the compiler have advanced features to enhance ILP: predication, speculation. • EPIC can design the plan of execution (POE) at compile-time and communicate the POE to the hardware. • EPIC must have massive hardware resources for parallel execution
Introducing IA-64 • IA-64 comes from Intel and is the first 64-bit architecture for Intel. • The first instance of a commercially available EPIC ISA. • The first architecture to bring ILP features to general-purpose microprocessors.
IA-64’s Architectural Basics • Explicit Parallelism • Enhanced ILP • Compiler-oriented • Extremely large physical memory • A huge virtual address space for applications • 64-bit computation • Extremely large register files
IA-64’s Key Technologies • Instructions Bundling • Predication • Control Speculation • Data Speculation • Software pipelining
Instruction Bundling 128-bit bundle • Uses a form of VLIW architecture • Three Instructions are combined into a 128-bit instruction • Parallel Instructions are executed in groups • Template bits decode and route instructions and mark the end of groups of parallel instructions. 41-bits 127 0 Insrtruction2 Instruction 1 Instruction 0 Template
ILP Bottlenecks • Branches • Deal with branch, take predication. • Branch mispredications cause 20% to 30% loss in processor performance . • Memory latency • Latency is the time it takes to get data from memory. The longer it takes you to access memory to get code and data, the longer the CPU sits idle. • For memory latency, it's the loads that are the big problem, not the stores.
Predication If A>B If A>B If A>B S+=A else S+=B end if Predicate S+=A S+=A S+=B The predication is wrong Throw away S+=A *P=S S+=B (b) IA-64 predication • Traditional predication Branching is a major cause of lost performance.
EPIC Predication Process Branch Candidate Instructions are packed into bundles Instructions are marked with ID Processor executes both paths in parallel Compiler finds what insts to execute in parallel Processor checks predication and stores correct results
Predication Benefits • Reduce branches • Reduce mispredication penalties • Reduce critical paths
Control Speculation Traditional Architectures IA-64 Architectures ld.s r8=a[ ] instr 1 instr 2 instr 1 instr 2 . . . br Barrier br Load a[ ] use chk.s r8 use Allows elevation of load, even above a branch Elevating the load above a branch is not possible Memory latency is a major performance bottleneck
Introducing the Token Bit IA-64 ld.s r8=a[ ] instr 1 instr 2 ;Exception Detection Propagate Exception br ;Exception Delivery chk.s r8 use • When elevate ld, give an exception detection • If the load address is valid, it’s normal. • If the load address is invalid, compiler sets token bit ,and jumps out of this path. • If the code goes to chk.s, and the chk.s detects the token bit,jumps to fix-up code,executes the load.
Data Speculation Traditional Architectures IA-64 instr 1 ALAT load.a instr 1 instr 2 instr 2 . . . store Barrier store load use load.c use Chk.a Allows the compiler to elevate the load ,even it isn’t sure if the memory reference overlaps. Can’t elevate the load, so prevents from reordering insts
chk.a reg#? ld.a reg# =... store Advanced Load Address Table: ALAT reg # Address reg # Address • When elevate ld.a,insert ALAT • When store, remove overlap address records in ALAT • When chk.a,if no address is found ,there is a conflict, and jumps to fix-up code to reexecute the code reg # Address ...
Speculation Benefits • Reduces impact of memory latency • Study demonstrates performance improvement of 80% when combined with predication • Greatest improvement to code with many cache accesses • Scheduling flexibility enables new levels of performance headroom
Software Pipelining vs. • Overlap the execution of different loop iterations • Get more iterations in same amount of time
Software Pipelining Example For(I=0;I<1000;I++) x[I]=x[I]+s; Loop: Ld f0,0(r1) Add f0,f0,f1 Sd f0,0(r1) Add r1,r1,8 Subi r2,r2,1 Benz loop Loop: SD f2, -4(r1) Add f2,f0,f1 Subi r2,r2,1 Ld f0, 4(r1) Benz loop Software pipelining
Software Pipelining Advantages • Traditionally performed through loop unrolling • less code compared loop unrolling, increased regularity • Smaller code means fewer cache misses • Especially useful for integer code with small number of loop iterations
Software Pipelining disadvantages • Requires many additional instructions to manage the loop • Without hardware support the overhead may greatly increase code size • typically only used in special technical computing applications
IA-64 Features Supporting Software Pipelining • Full predication • Circular Buffer of General and FP Registers • Loop Branches Decrement RRBs (register rename bases)
Summary • Predication removes branches • Parallel compares increase parallelism • Benefits complex control flow: large databases • Speculation reduces memory latency impact • IA-64 removes recovery from critical path • Benefits applications with poor cache locality: server applications, OS • S/W pipelining support with minimal overhead enables broad usage • Performance for small integer loops with unknown trip counts as well as monster FP loops
Reference • M. S. Schlanker, "EPIC: Explicitly Parallel Instruction Computing", Computer, vol. ?, No. ?, pp 37--45, 2000. • Jerry Huck et al., "Introducing the IA-64 Architecture", Sept - Oct. 2000, pp. 12-23 • Carole Dulong “The IA-64 Architecture at Work”,Computing Practices