100 likes | 236 Views
A Combinatorial Architecture for Instruction-Level Parallelism. Prepared by: HongJun Yu. Regulated Elements By Universal Scheme (REBUS). EXECUTABLE PROGRAM. Partitioned Instruction Streams. Processing Elements with Replicated Scratchpad Registers. Combinatorial Interconnection Structure.
E N D
A Combinatorial Architecture for Instruction-Level Parallelism Prepared by: HongJun Yu
Regulated Elements By Universal Scheme (REBUS) EXECUTABLE PROGRAM Partitioned Instruction Streams Processing Elements with Replicated Scratchpad Registers Combinatorial Interconnection Structure MCU MCU MCU Sliced Memory Hierarchy MEMORY SYSTEM
Processing Elements (PE) and Memory Coordination Units (MCU) 1 7 5 2 1 6 3 2 7 4 3 1 5 4 2 6 5 3 7 6 4 Reg PE 2 3 4 5 6 7 1 …… 2 3 4 5 6 7 1 MCU (X1, X2, X3, X4, X5, X6, X7) using (7, 7, 3, 3 ,1)
Structure of MCU and its connections Processing Element Processing Element Processing Element To other MCUs Scratchpad Registers Unit Controller Cache Memory To and From Main Memory
Structure of PE Global Signals Management Processor With Private Memory Queues of Scratchpad Copies R2 R1 MCU Interface R3
Pairwise-balanced combinatorial interconnection • X={x1, X2, X3, X4, X5, X6, X7, X8, X9} a Balanced Incomplete Block (BIB) with configuration (b, v, r, k, λ) • v : element number; b: number of k-subsets; r: each element appears exactly in r subsets; λ : each pair of elements appears exactly in λ subsets • v*r=b*k • For example (12,9,4,3,1) is a BIB
Cont’ • A program can be partitioned amongst the PEs by having an instruction’s operand pair determine the PE to which the instruction should be designated • ADD R1 R7 • MULT R2 R6 • DIV R4 R5 PE #1 PE #2 PE #5
Excellent ideas: • Implementing ultra parallelism using balanced incomplete block(BIB); • Expand parallelism from instruction level to assembly code level; • Parallelism is not restricted in a small size “window” of code; • Support parallelism among a group of connected processors; • Compatible to current technologies using in compiler and superscalar. • Could benefit to both RISC and CISC.
Characteristics: • Using fixed format of assembly code; • Usage of memory coordination units (MCU); • Need data replication
Future work • Apply on multi-threaded processing • Various instruction format support