1 / 23

NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays

NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays. Wei Zhang † , Niraj K. Jha † and Li Shang ‡ † Dept. of Electrical Engineering Princeton University ‡ Dept. of Electrical and Computer Engineering Queen’s University.

Ava
Download Presentation

NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡†Dept. of Electrical EngineeringPrinceton University‡ Dept. of Electrical and Computer EngineeringQueen’s University

  2. A Hybrid CMOS/NAnoTUbe REconfigurable Architecture • Motivation • Background on CNT and NRAM • Architecture of NATURE • Logic Folding • Experimental Results • Conclusions

  3. Motivation • Moore’s Law: What’s Next? • Carbon nanotubes (CNTs) • Nanowires • Single electron devices • ... • Challenges in nano-circuits/architectures • Lack of a mature fabrication process • Defects and run-time failures • Reconfigurable architectures, such as an FPGA, favored • Regular structures ease fabrication • Fault tolerance through reconfiguration

  4. Motivation (Contd.) • Problems of existing reconfigurable architectures • High reconfiguration time overhead • Low area efficiency • Some recent works on programmable nanofabrics Molecular logic array (Goldstein et al. [ICCAD 2002]) Nanowire PLA (Dehon et al. [FPGA 2004]) CMOS/nanowire hybrid architecture CMOL (Strukov et al. [Nanotechnology 2005]) • Fabrication problem not yet solved

  5. Advantages of NATURE • Hybrid design leverages beneficial aspects of both CMOS and CNT technologies • NRAMs are distributed in NATURE to store multi-context reconfiguration bits • Fine-grain reconfiguration (even cycle-by-cycle) • Enables temporal logic folding • Flexibility to perform area-performance trade-offs • One-to-two orders of magnitude increase in logic density CMOS fabrication compatible NRAM-based Run-time reconfiguration NATURE Temporal logic folding Logic density Design flexibility

  6. Background • Carbon nanotube (CNT) • Metallic or semiconducting • Single-wall or multi-wall • Diameter: 1-100nm • Length: up to millimeters • Ballistic transport • Excellent thermal conductivity • Very high current density • High chemical stability • Robust to environment Source: Euronanotrade

  7. Background (Contd.) • Non-volatile nanotube random-access memory (NRAM) • Mechanically bent or not: determines bistable on/off states • Fully CMOS-compatible manufacturing process • Prototype chip: 10 Gbit NRAM • Will be ready for the market in the near future Source: Nantero

  8. NRAMs • Properties of NRAMs • Non-volatile • Similar speed to SRAM • Similar density to DRAM • Chemically and mechanically stable • NATURE not tied to NRAMs • Phase change RAM • Magnetoresistive RAM • Ferroelectric RAM

  9. Architecture of NATURE • Island-style logic blocks (LBs) connected by various levels of interconnects • An LB contains a super macroblock (SMB) and a local switch matrix

  10. Architecture of a Super Macroblock (SMB) • n1macroblocks (MBs) comprise an SMB, here n1 = 4

  11. Architecture of a Macroblock (MB) • n2 logic elements (LEs) comprise an MB, here n2 = 4

  12. Logic Element and Interconnect • An LE implements a computation and contains: • An m-input look-up table (LUT) • A flip-flop • A pass transistor • Interconnect • Mixed wire segment scheme • 25%, 50% and 25% distribution for length-1, length-4 and long wires • Direct links from one LB to its 4 neighbors

  13. Support for Reconfiguration • Reconfiguration time short: 160ps • Area overhead of NRAMs • k: no. of reconfiguration sets per NRAM, assume k = 16 • Area overhead: 20.5% per LB, assuming 100nm technology for CMOS logic and nanotube length • Logic density = k (conf. copies) x area per configuration = 16*(1-0.205)=12.75 • Appropriate value for k obtained through design space exploration

  14. Temporal Logic Folding • Basic idea: one can use NRAM-enabled run-time reconfiguration to realize different Boolean functions in the same logic element (LE) every few cycles

  15. Example Without logic folding With logic folding Num of LEs = 2 Num of LEs = 6 Delay =4*clock_period Delay = 4 LE delays +Interconnect delay Clock period =LE delay +Reconfiguration +Interconnect delay

  16. Folding Levels • Logic folding can be performed at different levels of granularity, providing flexibility to perform area-performance trade-offs • A level-p folding implies reconfiguration of the LE after the execution of p LUT computations (a) level-1 folding (b) level-2 folding

  17. Choosing the Folding Level • Advantages of logic folding • Significant flexibility for performing area-performance trade-offs • Ability to map much larger circuits using the same number of LEs • Significant improvement in the area/circuit delay product • Reduction in the need for global routing Clock period increases: Routing delay increases Number of clock cycles decreases Reconfiguration time decreases Total delay typically decreases Folding level Area increases Number of LEs increases

  18. Experimental Setup • Instance of architecture: 4 MBs in an SMB, 4 LEs in an MB, and LEs contain a 4-input LUT • Number of reconfiguration copies k varied in order to compare implementations corresponding to selected folding levels: level-1, level-2, level-4 and no logic folding • Results based on 100nm CMOS technology parameters

  19. Experimental Results Average area-time product advantage = 2X Maximum area-time product advantage = 3X

  20. Experimental Results (Contd.) 16-RCA: 16-bit ripple carry adder 16-CLA: 16-bit carry lookahead adder 16-CSA: 16-bit carry select adder 8-MUL: 8-bit multiplier Average area-time product advantage = 13X Maximum area-time product advantage = 35X

  21. Experimental Results (Contd.) • Flexibility in performing area-performance trade-off • For area-time (AT) product, larger the circuit depth, more the advantages of level-1 folding relative to no folding • For the 64-bit ripple-carry adder, this advantage is about 35X • LE utilization and logic density very high, with a reduced need for a deep interconnect hierarchy

  22. Conclusions • NATURE: A novel high-performance run-time reconfigurable architecture • Introduction of NRAMs into the architecture enables cycle-by-cycle reconfiguration and logic folding • Choice of different folding levels allows the flexibility of performing area-performance trade-offs • Logic density and area-time product improved significantly • Can be very useful for cost-conscious embedded systems and future FPGA improvement

More Related