100 likes | 114 Views
This document describes a novel approach to implementing instruction-level parallelism in processing units, utilizing a Combinatorial Architecture with Replicated Scratchpad Registers and Structured Interconnections. It introduces the concept of Balanced Incomplete Blocks (BIB) for efficient program partitioning among Processing Elements (PEs). The architecture supports ultra-parallelism and extends parallelism from the instruction to assembly code level, enabling compatibility with current technologies. Features include a fixed assembly code format, Memory Coordination Units (MCU), and data replication for enhanced performance. Future work involves applying the architecture to multi-threaded processing and expanding instruction format support.
E N D
A Combinatorial Architecture for Instruction-Level Parallelism Prepared by: HongJun Yu
Regulated Elements By Universal Scheme (REBUS) EXECUTABLE PROGRAM Partitioned Instruction Streams Processing Elements with Replicated Scratchpad Registers Combinatorial Interconnection Structure MCU MCU MCU Sliced Memory Hierarchy MEMORY SYSTEM
Processing Elements (PE) and Memory Coordination Units (MCU) 1 7 5 2 1 6 3 2 7 4 3 1 5 4 2 6 5 3 7 6 4 Reg PE 2 3 4 5 6 7 1 …… 2 3 4 5 6 7 1 MCU (X1, X2, X3, X4, X5, X6, X7) using (7, 7, 3, 3 ,1)
Structure of MCU and its connections Processing Element Processing Element Processing Element To other MCUs Scratchpad Registers Unit Controller Cache Memory To and From Main Memory
Structure of PE Global Signals Management Processor With Private Memory Queues of Scratchpad Copies R2 R1 MCU Interface R3
Pairwise-balanced combinatorial interconnection • X={x1, X2, X3, X4, X5, X6, X7, X8, X9} a Balanced Incomplete Block (BIB) with configuration (b, v, r, k, λ) • v : element number; b: number of k-subsets; r: each element appears exactly in r subsets; λ : each pair of elements appears exactly in λ subsets • v*r=b*k • For example (12,9,4,3,1) is a BIB
Cont’ • A program can be partitioned amongst the PEs by having an instruction’s operand pair determine the PE to which the instruction should be designated • ADD R1 R7 • MULT R2 R6 • DIV R4 R5 PE #1 PE #2 PE #5
Excellent ideas: • Implementing ultra parallelism using balanced incomplete block(BIB); • Expand parallelism from instruction level to assembly code level; • Parallelism is not restricted in a small size “window” of code; • Support parallelism among a group of connected processors; • Compatible to current technologies using in compiler and superscalar. • Could benefit to both RISC and CISC.
Characteristics: • Using fixed format of assembly code; • Usage of memory coordination units (MCU); • Need data replication
Future work • Apply on multi-threaded processing • Various instruction format support