460 likes | 595 Views
Cellular Automata Based Reconfigurable Systems as a Transitional Approach to Gigascale Electronic Architectures. 26 September 2000 James C. Lyke. Outline. Introduction Background Description of reconfigurable cellular automata (RCA) arrays Summary of current status. Introduction.
E N D
Cellular Automata Based Reconfigurable Systems as a Transitional Approach to Gigascale Electronic Architectures 26 September 2000 James C. Lyke
Outline • Introduction • Background • Description of reconfigurable cellular automata (RCA) arrays • Summary of current status
Introduction • Constraints common to all molecular systems • Limited interconnection fan-in/fan-out • No effective lithography approach capable of adequate throughput • Design intolerance to random defects
The smaller the devices, the bigger the problems • Building the devices: very hard • Building the architectures: not easy • How do you harness 1012 simple devices? • Design capture, synthesis, verification? • How do you wire them together? • How do you assemble and package them? • How do you test finished devices? • How do you address yield issues? • How do you rectify design errors in “gigascale” designs?
Trends in architectures • Interconnection growth • Increased use of programmable logic devices in digital design • Field programmable gate array (FPGA) devices
Factors contributing to explosion in interconnections with diminishing scale • Non-scale-ability of resistance (R~L/A) • Packing considerations force minimum average length of interconnections to increase • Dimensionality of design • Hierarchy of design
The challenge of interconnect • Rent’s rule establish the growth of terminals (interface signals) as a function of gate count1 • An empirical explanation T = A G p T – terminal count A- terminals per sub-module G- gate count p- Rent’s exponent (0<p<1) 1Bakoglu, Interconnections and Packaging for VLSI, Addison-Wesley, Reading MA, 1990
Complex integrated circuits usually have p=1 p=0 p=0.8 p=0.5
Nanoscale Interconnect • In order to be manageable at large scales of complexity, exponents of Rent’s rule must be be consistent with dimensionality • Two-dimensional (planar) systems p<(1/2) • Three-dimensional systems: p<(2/3) • Rent’s rule is a statistical observation and a guideline, but must be used with care • A complex design may have different Rent’s exponents at different hierarchical levels and regions of design
Requirements for Complex Digital Design • Ability to form arbitrary arrangement of: • Logic • Memory • Interconnect • Field programmable gate arrays (FPGAs) emulate complex systems and allow these arrangements to be programmed
CLB CLB CLB CLB CLB Structure of RAM-based FPGA Configuration logic Block (CLB) unspecified interconnection
Adding USER memory to LUT a b c Short either, but not both a f LUT b D Q c f
Routing in FPGA Devices pass transistor memory bit
Design problem: F = A AND B G = C AND D Simple example G
Typical FPGA (corner of XC3020) Many details suppressed Source: Xilinx datasheet
Binary Cellular Automata:A lattice of computing points • Lattice of uniformly spaced point sites in 1,2, or 3 dimensions • Each point has a value of {0} or {1} • Value of each site updated at discrete time intervals • Updates are computed as a function of local neighborhood only
Conversion of 1-D CA into a 2-D spatial computation structure
Cellular Automata for Molecular Electronics • Cell behavior normally fixed and homogeneous across entire array • Turing complete • Normally perfect (mathematical abstraction) • Not practical for molecular implementation due to defects • Recover from this by relaxing first assumption • Allow rules at any site to be chosen from a set that is “Boolean complete”
Redefine CA sites as look-up tables (LUTs) • An 3-input LUT (LUT-3) can implement all cellular automata rules of neighborhood 3 LUT3 A C B
reconfigurable cellular automata -advantages for molecular architecture • Periodic structure • Amenable to chemical self-assembly • Reconfigurability-Logical behavior of each cell independently and repetitively programmable after fabrication • Low interconnection demand • Defect tolerance
3LUT 3LUT 3LUT C D A B Equivalence (functional isomorphisms) between CA and random logic forms A A B B F C C F D D (b) (a) C D A B C D A B F F F (d) (e) (c)
How template choice affects implementation of 4-input majority gate on three different RCA templates size: 21 size: 12 size: 20
3LUT tile of single cell type A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
Combinational logic x Y(t) Y(x,h) register array h h(t) Example of more complex architectures using RCA tiles Equivalent representations
Another RCA Architecture (Example) register file (bit array) clock m x n tile of LUTs register file (bit array) m x n tile of LUTs clock register file (bit array) Creates feedback path necessary for general clock-mode sequential behavior
Configuration of RCA • LUT memories implemented as shift register (2-phase clock) • Multiple configuration chains for large device • Not fault-tolerant method of bit-stream distribution
A computation result has a limited range of propagation Dead zones Inputs: cannot be reached Outputs: no results can be used A side effect - “cones of influence”
Routing heuristics for RCA structures may require simple modifications • (left) netlist example to be routed • (center) results (incorrect) from typical FPGA routing tool (error node in red) • (right) corrected results for RCA • Requires definition of a node resolution function (node in yellow)
Training neural nets to design circuits:Results of experimental neural-net based design tools, demonstrating combined “heuristics” (simultaneous technology mapping, placement, and routing)
An n-input look-up table is adequately modeled by a perceptron network with n neurons in its hidden layer * • based on analysis of Vapnik-Chervonenkis dimension • proved with brute force simulation for n = 3 case
How neural nets are trained to design circuits • Abstract a neural net model • Use truth table as the training set (= test set) • Build back-propagation system around tile to train (adjust weights of neurons) • Train / re-train with randomized version of training set until convergence occurs (if it occurs)
Neural network circuit designer tally compare offset
NN-produced results for 2-bit multiplier • Designs are not optimal, but they work • Could be improved with post-processing to remove nonsense constructs
Comparison of Conventional FPGAs to Reconfigurable Cellular Arrays (RCAs) • Similarities • Both use LUTs • Both are software configured with serial bitstreams • Differences • RCAs have no programmable routing • RCAs support only nearest neighbor connections • RCAs has much simpler (periodic) structure
Benchmark rationale • The design of FPGA architectures and their ability to express architectures is empirical • Benchmark suites exist (e.g. MCNC, PREP) to permit comparison of FPGA architectures and algorithms • Comparitive findings in benchmarking is the best current known way of establishing some yardstick, given that most steps in CAD are NP-complete and optimality cannot be proved in general
Other issue #1 • Departure from regularity • Small world (0<p<1 fraction of connections dislocated from lattice) • Semi-structured (Most LUTs point in the right direction) • Amorphous / random structure • Challenge: find O(N) algorithms for “structure discovery” • Question: Can we establish statistical evidence that semi-structured / amorphous cellular networks are adequate as media for hosting complex designs?
Other issue #2 1 cm2 chip 100 um ~840 molecular gates • Signal delivery from “outside world” • X-Y signal grid of 100 microns • Molecular (fractal?) distribution network may be required to combat signal starvation at nanoscale network level Signal terminal from “outside world”
Summary • Reconfigurable cellular arrays are promising as a molecular-scale architecture • Interconnect, defect tolerance, self-assembly • Templates can be tuned to specific molecular concept • Even as abstract approach, some important loose ends need to be dealt with • Configuration bitstream • Hierarchical assembly specifics • Proof that media is competitive with a standard FPGA approach if it could be scaled to molecular levels