1 / 16

MacroNET Lightnode Architecture & Network

MacroNET Lightnode Architecture & Network. ELE-580i PRESENTATION-II 04/29/2003 Canturk ISCI. OUTLINE. Introduction/Review [1min] Methodology [2min] Communications [3min] CPU & Router [6min] Transistor Counts [4min] Simulation [1min] MacroNET SIM DEMO [3min].

kmora
Download Presentation

MacroNET Lightnode Architecture & Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MacroNETLightnode Architecture& Network ELE-580iPRESENTATION-II 04/29/2003 Canturk ISCI

  2. OUTLINE • Introduction/Review [1min] • Methodology [2min] • Communications [3min] • CPU & Router [6min] • Transistor Counts [4min] • Simulation [1min] • MacroNET SIM DEMO [3min]

  3. Introduction / Review • MacroNET: • Large area electronics + flexible & deformable <rolled like a window shade> • Distributed processing units (light) • Project Definition & Motivation: • Sparse topology & Lightweight node & Fault tolerant • Transistor Budget • Compute & Route same + buffer=regfile/memory • Some extent of path diversity • Particular design challenges • Tight complexity, fault tolerance • Relaxed Latency, performance • Related Work: • More performance driven • Lightweight networks – for on chip processors focus on reducing latency

  4. Transistor Budget Topology Routing & Flow Control Compute Node Model Instructions & Communication Traffic Pattern Methodology/Experience • Assume a Topology  • Now TORUS (as the superset of others) • Define the nature of Communication • With external world / sensors • The Strokes • instructions among nodes • Zoom around node(r,c) • Dim / brighten • Maketop N/S/W/E • Compute distance • Instructions within nodes • Packet Types • Explicit acks for simplicity [alewife] • Short size [alpha] • Idempotency/delivery • Msg/ack/conf [idempotent] • + Acked & broadcasting • Routing Model • Livelock avoidance • Define compute node • CPU + Router Architecture • T counts < 1000 • 10% for pads

  5. COMMUNICATIONS • Input Method: • From sensors like strokes  converted to instr-n packets • Zoom/Dim/Brighten/Maketop: • Click to cell  Processor makes the message  Broadcasts • Compute Distance: • 1st click: 1st cell Processor makes the message  Broadcasts • 2nd click: 2nd cell Processor sets status flag, waits for message  computes distance • Packet Types: • 8 bit flits, phits = flits • CPU instr-n: 000|xxxxx • MSG: [abc|xxxxx][xxxxxxxx][…] • Special Packets: • ACK: [111|11111] • ACKED: [111|10101] • CONFIRM: [111|01010]

  6. Instructions • All instr-ns go thru CPU pipe • Local instr-ns: [000|xxxxx] • Network instr-ns: [abc|xxxxx] • Zoom Around: • 1st flit will tell src • 2nd flit will tell color for next or keep • [001|01|10|x][color/keep][bright/keep] • Dim/ Brighten : Single flit [010/011|xxxxx] • Maketop N/S/W/E: • Livelock problems when incorporating data exchange • Use Maketop + Exchange (unicast) • Maketop: Single flit tells direction [100|01|xxx] • Exchange: tells dest. [110|01|11|x][color][bright] • Sth like dimensional routing • Need timeout for flt tolerance now • Compute Distance: single flit tells dest. [101|11|00|x] • Can be made unicast but no need

  7. CPU & ROUTER • Router = Processor + AUXILIARY • Package headers are instr-ns • Tagging tells what is what • Headers also decoded in ID • Messages also walk thru CPU pipe • Maketop Different packets arrive simult. • Need simple arbitration (LRS) • Exchange Routing • Need simple router within controller • Compute Distance: arithmetic • Need a simple ALU • Status Flags (recall HW March. Presentation) • For each outgoing channel • Global one for CPU

  8. Router Architecture • Mostly for broadcasting with minimal possible HW • Stupid, but T counts are problem • Single cycle in order datapath • No buffering • Keeps certain node info • Examples in [NOW][mmr][flex] & [stallion] focus on other issues • We can’t fit an 8 bit 4x4 Xbar ([Ruby]&[bkmrk]) • We start from [SP2] & [Hwpres] and implement: • Mem based buffering/switching (static structure) • Allocation: LRS • ~Credit Based Flow Control (creditMAX=1) • Register Mapped (5 reg) Network interface

  9. I/p channel status o/p channel status Process Queue (ack) P1 P3 - - P0 (acked) P1 (confirm/idle) P2 NODE O/P P3 I/p channel status I/p channel status I$ Dec EX D$ RF o/p channel status o/p channel status Timeout Timer CPU Status Flags NODE INFO DatapathRelated Allocation Color (ack) MessageGeneration Routing Brightness (acked) (confirm/idle) CONTROLLER X Y TOP I/p channel status o/p channel status Router Diagram

  10. Status Fields • CPU Status flags • Such as: Waiting for data from Px • To Stall the datapath or block dispense of Reg-s • I/p channel Status Fields [I|R/W|A|Acked] • I: Idle <Default> • R/W: Routing or waiting for its turn • A: CPU processing packet or sending downstream • Only 1 port can be active at a time • Acked: Another port already rcved/processed same msg • O/p channel Status Fields [Ack|Acked|Timeout|W] • Ack: Downstream idle (Acked last flit) <default> • Acked: Downstream Acked’ed last packet • don’t send any more • Timeout: Downstream’s last awaited ACK timed out • W: Waiting for ACK

  11. TRANSISTOR COUNTS • Except from Datapath: • 13 x 8 bit Regs • 1 x 4 bit Timer • 3 x 2 bit X/Y/TOP • Datapath: • I$: May be none • Decode: comb’l and small • RF: Very few might do • EX: AT LEAST add/sub (compute dist) • D$: May be none • Control: • Few states, at most 4 bit reg • Small NSE & o/p DEC

  12. Process Constraints • 1000 Ts per node • [MIT RAW]  ~10% wasted for padding • All the logic < 900 Ts  • Only n-channel Ts • No CMOS • Alternatives: • Dynamic logic w/ delay on PCHRG lines • Pseudo NMOS logic w/o the PMOS • Static discharge • Can’t help VT drop • Others  we have EMD

  13. Transistor Calculation • Reg-s are the killer • Non-overlapping 2 phase [stallion] • 6Ts with pass gated • Complicated clk generation • C2MOS • 8 Ts & 2 phase • Double True single phase (master-slave) • 12T • True single phase  • clk=Hi  transparent, but 6T • Controller should take care of clk • 13 x 8bit + 3 x 2bit + 1 x 4bit = 660Ts • EX: only add/sub & can do serial • TG adder  simplify to pass gate • Less than 20T • Can also add a barrel shifter • Less than 20Ts [colt][stallion] • Controller: • Even with > 100 states 8 bit Reg sufficient • 50 Ts for state • RF: • 1 reg = 48T #RF< 3! • Remaining: • Comb’l DEC • Comb’l Cont • I$ • D$ • < ~150T

  14. SIMULATION • Target: • Revert to Torus topology (4ary 2cube) • Anything will be a subset • Only Consider channel latencies • Will reveal network latency • Provide fault injection • Key to assess other topologies • Also demonstrate fault tolerance • Emulate clicks – simple user input • Obviously, gui • Try to keep loyal to actual decision making • Restrictions • Cannot do zoom with 4x4 network

  15. (0,0) (0,1) (0,2) (1,0) (1,1) (1,2) (1,3) (2,0) (2,2) (2,3) (3,0) (3,1) (3,2) (3,3) MacroNET SIM • Similar to animation of previous talk • Currently(28/4/11:00am) dim works on Torus • Easy to add: Brighten, Compute distance, MakeTop • More effort: Fault injection & exchange (r,c) (0,0) (0,1) (0,2) (0,3) (0,3) Initial Processor (r,c) (1,0) (1,1) (1,2) (1,3) Updated Processor Sent Instruction (2,0) (2,1) (2,1) (2,2) (2,3) Sent ACK Sent ACKed (3,0) (3,1) (3,2) (3,3) Used Channel

  16. MacroNET SIM DEMO

More Related