160 likes | 178 Views
Explore the MacroNET project, focusing on lightweight, fault-tolerant light nodes with sparse topology. Cover methodology, communications, CPU & Router, transistor counts, and simulation. Discuss communications through sensors and instructions propagation among nodes.
E N D
MacroNETLightnode Architecture& Network ELE-580iPRESENTATION-II 04/29/2003 Canturk ISCI
OUTLINE • Introduction/Review [1min] • Methodology [2min] • Communications [3min] • CPU & Router [6min] • Transistor Counts [4min] • Simulation [1min] • MacroNET SIM DEMO [3min]
Introduction / Review • MacroNET: • Large area electronics + flexible & deformable <rolled like a window shade> • Distributed processing units (light) • Project Definition & Motivation: • Sparse topology & Lightweight node & Fault tolerant • Transistor Budget • Compute & Route same + buffer=regfile/memory • Some extent of path diversity • Particular design challenges • Tight complexity, fault tolerance • Relaxed Latency, performance • Related Work: • More performance driven • Lightweight networks – for on chip processors focus on reducing latency
Transistor Budget Topology Routing & Flow Control Compute Node Model Instructions & Communication Traffic Pattern Methodology/Experience • Assume a Topology • Now TORUS (as the superset of others) • Define the nature of Communication • With external world / sensors • The Strokes • instructions among nodes • Zoom around node(r,c) • Dim / brighten • Maketop N/S/W/E • Compute distance • Instructions within nodes • Packet Types • Explicit acks for simplicity [alewife] • Short size [alpha] • Idempotency/delivery • Msg/ack/conf [idempotent] • + Acked & broadcasting • Routing Model • Livelock avoidance • Define compute node • CPU + Router Architecture • T counts < 1000 • 10% for pads
COMMUNICATIONS • Input Method: • From sensors like strokes converted to instr-n packets • Zoom/Dim/Brighten/Maketop: • Click to cell Processor makes the message Broadcasts • Compute Distance: • 1st click: 1st cell Processor makes the message Broadcasts • 2nd click: 2nd cell Processor sets status flag, waits for message computes distance • Packet Types: • 8 bit flits, phits = flits • CPU instr-n: 000|xxxxx • MSG: [abc|xxxxx][xxxxxxxx][…] • Special Packets: • ACK: [111|11111] • ACKED: [111|10101] • CONFIRM: [111|01010]
Instructions • All instr-ns go thru CPU pipe • Local instr-ns: [000|xxxxx] • Network instr-ns: [abc|xxxxx] • Zoom Around: • 1st flit will tell src • 2nd flit will tell color for next or keep • [001|01|10|x][color/keep][bright/keep] • Dim/ Brighten : Single flit [010/011|xxxxx] • Maketop N/S/W/E: • Livelock problems when incorporating data exchange • Use Maketop + Exchange (unicast) • Maketop: Single flit tells direction [100|01|xxx] • Exchange: tells dest. [110|01|11|x][color][bright] • Sth like dimensional routing • Need timeout for flt tolerance now • Compute Distance: single flit tells dest. [101|11|00|x] • Can be made unicast but no need
CPU & ROUTER • Router = Processor + AUXILIARY • Package headers are instr-ns • Tagging tells what is what • Headers also decoded in ID • Messages also walk thru CPU pipe • Maketop Different packets arrive simult. • Need simple arbitration (LRS) • Exchange Routing • Need simple router within controller • Compute Distance: arithmetic • Need a simple ALU • Status Flags (recall HW March. Presentation) • For each outgoing channel • Global one for CPU
Router Architecture • Mostly for broadcasting with minimal possible HW • Stupid, but T counts are problem • Single cycle in order datapath • No buffering • Keeps certain node info • Examples in [NOW][mmr][flex] & [stallion] focus on other issues • We can’t fit an 8 bit 4x4 Xbar ([Ruby]&[bkmrk]) • We start from [SP2] & [Hwpres] and implement: • Mem based buffering/switching (static structure) • Allocation: LRS • ~Credit Based Flow Control (creditMAX=1) • Register Mapped (5 reg) Network interface
I/p channel status o/p channel status Process Queue (ack) P1 P3 - - P0 (acked) P1 (confirm/idle) P2 NODE O/P P3 I/p channel status I/p channel status I$ Dec EX D$ RF o/p channel status o/p channel status Timeout Timer CPU Status Flags NODE INFO DatapathRelated Allocation Color (ack) MessageGeneration Routing Brightness (acked) (confirm/idle) CONTROLLER X Y TOP I/p channel status o/p channel status Router Diagram
Status Fields • CPU Status flags • Such as: Waiting for data from Px • To Stall the datapath or block dispense of Reg-s • I/p channel Status Fields [I|R/W|A|Acked] • I: Idle <Default> • R/W: Routing or waiting for its turn • A: CPU processing packet or sending downstream • Only 1 port can be active at a time • Acked: Another port already rcved/processed same msg • O/p channel Status Fields [Ack|Acked|Timeout|W] • Ack: Downstream idle (Acked last flit) <default> • Acked: Downstream Acked’ed last packet • don’t send any more • Timeout: Downstream’s last awaited ACK timed out • W: Waiting for ACK
TRANSISTOR COUNTS • Except from Datapath: • 13 x 8 bit Regs • 1 x 4 bit Timer • 3 x 2 bit X/Y/TOP • Datapath: • I$: May be none • Decode: comb’l and small • RF: Very few might do • EX: AT LEAST add/sub (compute dist) • D$: May be none • Control: • Few states, at most 4 bit reg • Small NSE & o/p DEC
Process Constraints • 1000 Ts per node • [MIT RAW] ~10% wasted for padding • All the logic < 900 Ts • Only n-channel Ts • No CMOS • Alternatives: • Dynamic logic w/ delay on PCHRG lines • Pseudo NMOS logic w/o the PMOS • Static discharge • Can’t help VT drop • Others we have EMD
Transistor Calculation • Reg-s are the killer • Non-overlapping 2 phase [stallion] • 6Ts with pass gated • Complicated clk generation • C2MOS • 8 Ts & 2 phase • Double True single phase (master-slave) • 12T • True single phase • clk=Hi transparent, but 6T • Controller should take care of clk • 13 x 8bit + 3 x 2bit + 1 x 4bit = 660Ts • EX: only add/sub & can do serial • TG adder simplify to pass gate • Less than 20T • Can also add a barrel shifter • Less than 20Ts [colt][stallion] • Controller: • Even with > 100 states 8 bit Reg sufficient • 50 Ts for state • RF: • 1 reg = 48T #RF< 3! • Remaining: • Comb’l DEC • Comb’l Cont • I$ • D$ • < ~150T
SIMULATION • Target: • Revert to Torus topology (4ary 2cube) • Anything will be a subset • Only Consider channel latencies • Will reveal network latency • Provide fault injection • Key to assess other topologies • Also demonstrate fault tolerance • Emulate clicks – simple user input • Obviously, gui • Try to keep loyal to actual decision making • Restrictions • Cannot do zoom with 4x4 network
(0,0) (0,1) (0,2) (1,0) (1,1) (1,2) (1,3) (2,0) (2,2) (2,3) (3,0) (3,1) (3,2) (3,3) MacroNET SIM • Similar to animation of previous talk • Currently(28/4/11:00am) dim works on Torus • Easy to add: Brighten, Compute distance, MakeTop • More effort: Fault injection & exchange (r,c) (0,0) (0,1) (0,2) (0,3) (0,3) Initial Processor (r,c) (1,0) (1,1) (1,2) (1,3) Updated Processor Sent Instruction (2,0) (2,1) (2,1) (2,2) (2,3) Sent ACK Sent ACKed (3,0) (3,1) (3,2) (3,3) Used Channel