An Instruction Set and Micro architecture for Instruction Level Distribution Processing

An Instruction Set and Micro architecture for Instruction Level Distribution Processing (Ho-Seop Kim and James E. Smith) Haiying Qu Electrical and Computer Engineering University of Alberta

Introduction 1 • ILP: Instruction Level Parallelism • Achieved significant performance gains • ILDP: Instruction Level Distributed Processing • Technology trend

Introduction 2 • Proposed Micro architecture • Short pipelines • Distributed processing elements: in-order instruction processing enable out-of order execution • Strand: dependent instructions • Accumulator • Inter instruction communication

64 General Purpose Registers: R0-R63 Source or Destination 8 Accumulators: A0-A7 Dead Accumulator Instruction Set

Load/store Instruction • One accumulator value • One GPR • One parcel • Ai <- mem(Aj) • Ai <- mem(Rj) • mem(Ai) <- Rj • mem(Rj) <- Ai

Register Instruction • Operation: accumulator and GPR/immediate • Result: accumulator or GPR • Ai <- Ai op Rj • Ai <- Ai op immed • Ai <- Rj op immed • Rj <- Ai • Rj <- Ai op immed

Branch/jump Instruction • Conditional branch: compare Ai, 0 or GPR(All usual predicates) • Program counter (p) • Indirect jump: Ai or GPR • Return address: GPR • P <- P + immed; Ai pred Rj • P <- P + immed; Ai pred 0 • P <- Ai • P <- Rj • P <- Ai; Rj <- P++

Example Code

Strand Figure 3. Types of values and and associated registers

Two strands intersect: copy one to GPR Out put is a static global register New strand Strand Ends Figure 4. Issue timing

Stages • Fetch: 4 words-- over 4 instructions • Parceling: Break into individual instructions • Renaming: GPR • Steering: into FIFO according to the accumulators

Figure 5 ILDP Processor Block Diagram

Some Concepts • PE: Processing Element • IR: Issue Register—single Reservation Station • ICN: Interconnection Network

Figure 6 Micro architecture

Table 1 Complexity Comparison Please be noted: the ILDP’s is based on one PE

Table 2 Bench Mark Program Properties

Evaluation 1 Figure 7 type of register values Figure 8 Average strand length

Evaluation 2 Figure 9 Strand end Figure 10 instruction size

Evaluation 3 Figure 11 Cumulative strand re-use Figure 12 IPC

Evaluation 4 Figure 13 Global register rename map read/ write bandwidth

Table 3 Simulator Configurations

Discussion

An Instruction Set and Micro architecture for Instruction Level Distribution Processing

An Instruction Set and Micro architecture for Instruction Level Distribution Processing

Presentation Transcript

The Instruction Set Architecture Level

The Instruction Set Architecture Level

The Instruction Set Architecture Level

The Instruction Set Architecture Level

The Instruction Set Architecture Level

Instruction Set Architecture

Instruction Set Architecture

Instruction Set architecture

The Instruction Set Architecture Level

Instruction Set Architecture

Instruction Set Architecture

Instruction Set Architecture

Instruction Set Architecture

Instruction Set Architecture

Instruction Set Architecture

Instruction Set Architecture

Instruction Set Architecture

Instruction Set Architecture

Instruction Set Architecture