200 likes | 212 Views
This article provides an overview of the novel features of IA-64 and HPL-PD architectures, including predication, data speculation, control speculation, software pipelining, and compiler-directed caching. It explores the similarities in ISA and the few extensions, such as multimedia instructions and semaphore instructions. The support for conditional executions of instructions through predicate registers and parallel compare operations is discussed. Additionally, the article examines the similarities and differences in data speculation, compiler-directed cache, and support for software pipelining.
E N D
NYU Comparing IA-64 and HPL-PD
Overview • IA-64 has a number of novel features for supporting ILP: • Predication • Data Speculation • Control Speculation • Software Pipelining • Compiler-directed Caching • These features all exist in HPL-PD! • also great similarity in ISA (arithmetic, logic operations, etc). • there are few extensions • Multimedia Instructions • Semaphore Instructions
Predication Support • IA-64 and Trimaran both support conditional executions of instructions through predicate registers, and instructions to manipulate them. • Both support “parallel” compare operations • I.e. assigning to two predicate registers simultaneously • through a modifier in HPL-PD • through a completer in IA-64 • wired-and, wired-or
Control Speculation • Control Speculation is supported in both IA-64 and HPL-PD with the same semantics • IA-64 • GPR includes 1 bit speculation tag (NAT bit) • FPR uses a special encoding called NATVal • No extra bit needed • Only LOAD instruction has control speculative version • Need verification instruction for exception handling • HPL-PD • Both GPR and FPR have speculation tag • Extra bit like NAT in IA-64 • All integer instruction and float point instruction have control speculative versions • Exception is automatically tracked by the hardware
Data Speculation • Data speculation is supported in both IA-64 and HPL-PD in a similar manner. • I.e. moving a load above a store that may write to the same address. • IA-64 • Supports load checking (ld.s) as well as checking with recovery • The compiler can move up not only the definitions, but also one or more of its uses (check.a) • HPL-PD • Also supports recovery in load checking (BRDV)
Data SpeculationExamples IA-64 HPL-PD
Compiler Directed Cache • The memory hierarchy is visible to the compiler in both HPL-PD and IA-64 • IA-64 • The compiler can supply hints in store, load, and prefetch instructions on where in the cache hierarchy the data will be found or left. • For prefetching, the “lfetch” instructions requests that cache lines be moved between different levels of the memory hierarchy. • lfetch maintains cache coherence • HPL-PD • The compiler can also supply hints in store, and load instructions • Prefetching is simply a load to R0
Support for Software Pipelining • Both IA-64 and Trimaran implement rotating registers, loop counters, and epilogue counters in combination with predication. • Used to implement modulo scheduling of loops.
Software Pipelining ExampleHPL-PD Example of software pipelining in Trimaran “A slice executed as a single VLIW instruction.” Taken from the Trimaran Tutorial
Software Pipelining IA-64 Software pipelining on the IA-64 loop (p14) ld1 r32 = [r12],1 (p15) add r34 = 1, r33 (p16) st1 [r13] = r35,1 br.ctop loop C source for (i=0; i<n; i++) y[i] = x[i] + 1 Taken from the Intel web tutorial
Differences • Multimedia Instruction • Semaphore Instruction • Register Stack Engine
Register Stack Engine • IA-64 implements a mechanism called a register stack engine (RSE) that manages the dynamic allocation of stack frames using registers gpr32-gpr127. • The operations of the RSE are transparent to the software. • It ensures that contents of registers are always available.
Multimedia Instruction • IA-64 has multimedia instructions that treat the GPRs as concatenation of eight 8-bit, four 16-bit or two 32-bits and operate on each element independently and in parallel. • Inspired by MMX • The instructions include • parallel addition and subtraction • parallel average • parallel shift left and add • parallel compare • parallel multiply right
Semaphore Instruction • IA-64 has semaphore instructions that • atomically load a general register from memory, • perform an operation and • then store a result to the same memory location. • The instructions include • exchange • compare and exchange • fetch and add