230 likes | 747 Views
NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays. Wei Zhang † , Niraj K. Jha † and Li Shang ‡ † Dept. of Electrical Engineering Princeton University ‡ Dept. of Electrical and Computer Engineering Queen’s University.
E N D
NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡†Dept. of Electrical EngineeringPrinceton University‡ Dept. of Electrical and Computer EngineeringQueen’s University
A Hybrid CMOS/NAnoTUbe REconfigurable Architecture • Motivation • Background on CNT and NRAM • Architecture of NATURE • Logic Folding • Experimental Results • Conclusions
Motivation • Moore’s Law: What’s Next? • Carbon nanotubes (CNTs) • Nanowires • Single electron devices • ... • Challenges in nano-circuits/architectures • Lack of a mature fabrication process • Defects and run-time failures • Reconfigurable architectures, such as an FPGA, favored • Regular structures ease fabrication • Fault tolerance through reconfiguration
Motivation (Contd.) • Problems of existing reconfigurable architectures • High reconfiguration time overhead • Low area efficiency • Some recent works on programmable nanofabrics Molecular logic array (Goldstein et al. [ICCAD 2002]) Nanowire PLA (Dehon et al. [FPGA 2004]) CMOS/nanowire hybrid architecture CMOL (Strukov et al. [Nanotechnology 2005]) • Fabrication problem not yet solved
Advantages of NATURE • Hybrid design leverages beneficial aspects of both CMOS and CNT technologies • NRAMs are distributed in NATURE to store multi-context reconfiguration bits • Fine-grain reconfiguration (even cycle-by-cycle) • Enables temporal logic folding • Flexibility to perform area-performance trade-offs • One-to-two orders of magnitude increase in logic density CMOS fabrication compatible NRAM-based Run-time reconfiguration NATURE Temporal logic folding Logic density Design flexibility
Background • Carbon nanotube (CNT) • Metallic or semiconducting • Single-wall or multi-wall • Diameter: 1-100nm • Length: up to millimeters • Ballistic transport • Excellent thermal conductivity • Very high current density • High chemical stability • Robust to environment Source: Euronanotrade
Background (Contd.) • Non-volatile nanotube random-access memory (NRAM) • Mechanically bent or not: determines bistable on/off states • Fully CMOS-compatible manufacturing process • Prototype chip: 10 Gbit NRAM • Will be ready for the market in the near future Source: Nantero
NRAMs • Properties of NRAMs • Non-volatile • Similar speed to SRAM • Similar density to DRAM • Chemically and mechanically stable • NATURE not tied to NRAMs • Phase change RAM • Magnetoresistive RAM • Ferroelectric RAM
Architecture of NATURE • Island-style logic blocks (LBs) connected by various levels of interconnects • An LB contains a super macroblock (SMB) and a local switch matrix
Architecture of a Super Macroblock (SMB) • n1macroblocks (MBs) comprise an SMB, here n1 = 4
Architecture of a Macroblock (MB) • n2 logic elements (LEs) comprise an MB, here n2 = 4
Logic Element and Interconnect • An LE implements a computation and contains: • An m-input look-up table (LUT) • A flip-flop • A pass transistor • Interconnect • Mixed wire segment scheme • 25%, 50% and 25% distribution for length-1, length-4 and long wires • Direct links from one LB to its 4 neighbors
Support for Reconfiguration • Reconfiguration time short: 160ps • Area overhead of NRAMs • k: no. of reconfiguration sets per NRAM, assume k = 16 • Area overhead: 20.5% per LB, assuming 100nm technology for CMOS logic and nanotube length • Logic density = k (conf. copies) x area per configuration = 16*(1-0.205)=12.75 • Appropriate value for k obtained through design space exploration
Temporal Logic Folding • Basic idea: one can use NRAM-enabled run-time reconfiguration to realize different Boolean functions in the same logic element (LE) every few cycles
Example Without logic folding With logic folding Num of LEs = 2 Num of LEs = 6 Delay =4*clock_period Delay = 4 LE delays +Interconnect delay Clock period =LE delay +Reconfiguration +Interconnect delay
Folding Levels • Logic folding can be performed at different levels of granularity, providing flexibility to perform area-performance trade-offs • A level-p folding implies reconfiguration of the LE after the execution of p LUT computations (a) level-1 folding (b) level-2 folding
Choosing the Folding Level • Advantages of logic folding • Significant flexibility for performing area-performance trade-offs • Ability to map much larger circuits using the same number of LEs • Significant improvement in the area/circuit delay product • Reduction in the need for global routing Clock period increases: Routing delay increases Number of clock cycles decreases Reconfiguration time decreases Total delay typically decreases Folding level Area increases Number of LEs increases
Experimental Setup • Instance of architecture: 4 MBs in an SMB, 4 LEs in an MB, and LEs contain a 4-input LUT • Number of reconfiguration copies k varied in order to compare implementations corresponding to selected folding levels: level-1, level-2, level-4 and no logic folding • Results based on 100nm CMOS technology parameters
Experimental Results Average area-time product advantage = 2X Maximum area-time product advantage = 3X
Experimental Results (Contd.) 16-RCA: 16-bit ripple carry adder 16-CLA: 16-bit carry lookahead adder 16-CSA: 16-bit carry select adder 8-MUL: 8-bit multiplier Average area-time product advantage = 13X Maximum area-time product advantage = 35X
Experimental Results (Contd.) • Flexibility in performing area-performance trade-off • For area-time (AT) product, larger the circuit depth, more the advantages of level-1 folding relative to no folding • For the 64-bit ripple-carry adder, this advantage is about 35X • LE utilization and logic density very high, with a reduced need for a deep interconnect hierarchy
Conclusions • NATURE: A novel high-performance run-time reconfigurable architecture • Introduction of NRAMs into the architecture enables cycle-by-cycle reconfiguration and logic folding • Choice of different folding levels allows the flexibility of performing area-performance trade-offs • Logic density and area-time product improved significantly • Can be very useful for cost-conscious embedded systems and future FPGA improvement