170 likes | 255 Views
What I do. Mike Niemier (a.k.a. Mike) mniemier@cc.gatech.edu Phone: 404-894-1704 (w) Phone: 404-843-9511 (h). My research goals…. Introduce a systems -level research component to computing at the nano-scale In my case with a device called “QCA”
E N D
What I do. Mike Niemier (a.k.a. Mike) mniemier@cc.gatech.edu Phone: 404-894-1704 (w) Phone: 404-843-9511 (h)
My research goals… • Introduce a systems-level research component to computing at the nano-scale • In my case with a device called “QCA” • Advance coupling b/t nano-scale devices & computer architecture • Define operational regions for systems of QCA cells • Combine advances in nano-scale devices and computer architecture • (to get to real systems sooner &more productively) • Makes physicists work more with systems people • Help them understand everything else that’s involved to get to computation • (and make them LIKE IT)
1st devices… • Quantum transistors, RTDs, SETs, computing w/molecules, nanotube arrays, quantum computing, DNA-based computation, … Nano-tubes and Nano-wires: (Fuhrer, Goldstein, Dehon) • Applications • Interconnect, SETs, levers • Structures • Arrays, crossbars, FPGAs, fabrics • Challenges • Alignment, defects, micro/nano interfacing, gain/signal restoration customization • Hot: compile to space • Not: compile to time Quantum computing: (Oskin, Chong, Chuang) • Only small devices built, lots of error correction, dataflow? Architects & nanotechnology
A cell with 4 dots Cell 2 Cell 2 Cell 1 Cell 1 • 2 extra electrons • Tunneling between dots An intro. to QCA • Conceptual Quantum-dot Cellular Automata (QCA) • Binary information encoded in charge configuration • QCA, CMOS, and Zuse’s paradigm: Cell-cell response function • Bi-stable, nonlinear cell-to-cell response • Restoration of signal levels • Robustness against Similar properties disorder cross implementation! Paradigm shift to molecular electronics QCA: molecules = charge containers, not current switches
Why My Work (Part 1) • CMOS provides faster devices, clocks, more computation • …but architects provide smartercomputation • Moore’s Law trends may be continued w/nano-scale devices • A particular focus: molecular nanoelectronics… • High functional density: 1011-1013 devices/cm2 (ideally 1014) • Ultimate limit of device scaling… • Most nano-scale devices targeted for computational systems • Architects understand them best • To complete the picture, we must answer: • Can we “compute” within different device paradigms? • Can system-level research help drive device research?
ISCA: International Symposium of Computer Architecture Only 2 papers on emergent technologies to date… …but that session in 2001 …and this work part of it Bob Colwell’s “3rd Prediction”: CMOS-based Moore’s Law ends Other technologies will be looked at, needed; ISCA will too NSF NIRT grantee conference: Question: “What’s missing?” Answer: “Architecture” ISCA paper by topics Source: Bob Colwell, ISCA 2002 keynote Why my work? (part 2)
Early work:devices Device physics work… ey Custom work sets the stage for buildable designs ey q1 q2 = 0o 1 1 1 4 3 2 1 4 3 2 1 Progressed to simple Molecular device Design rules 3 3 3 3 circuits, architectures bridge the gap work 1 2 3 4 1 2 3 4 1 1 1 Algorithms to assist w/constraints mP, generic architectures 1 2 3 4 5 6 of QCA routing/layout next logical step Systems work… A B C D E F My Old Work
Hold Hold Hold Hold Switch Hold Hold Switch Switch Release Release Release Relax Relax Relax Relax Release Release Release Relax Relax Switch Switch Switch Switch A bit of background “Schematic” Time Step 1 Time Step 2 Time Step 3 Time Step 4 Time Step 5 ClockingZone 1 ClockingZone 2 ClockingZone 3 ClockingZone 4 ClockingZone 5 Fixed “driver” cell Time Wire Position
1 2 3 4 1 1 2 3 4 1 Affecting device development This floorplan functionality seen here… Device physicists/EEs studying how to build/implement/test/simulate our floorplan functionality Logic on top of wires Courtesy of Craig Lent (input) (device) (input) (output) (input)
Acc Instruction Register Data from memory (for LOAD/arith. instruction) N Memory-to-IR IR-to-ALU G I Q New Mux PC-to-Bmux feedback Shows consequences (loads inst. into IR) (loads PC for JMP) Bmux select “pipelining provides”: F B Mux J Acc-to-ALU feedback Computation ballistic! Memory P A E Read/Write IR B Before: processing is A B Program Counter what’s possible in 1 Zero A ALU Logic/Adder Memory write time step enable B-invert (AND/OR) S D Carry-in R Now, coordinate Read/Write Read/Write H PC/IR ACC PC signal arrival times to C ensure processing will M K occur at all IR-to-memory path (for STORE instruction) Acc-to-memory feedback PC-to-memory path L A bit more of my old work JMP ADD Select PC/IR as memory addr.
start Gather basic information – Ek, required clock strength, etc. Investigate thicker wires, stronger clocks, etc. Study race cond., calculate critical path length Do a logical circuit layout in QCA Can change in clock help? Can clock be built? Simulate for logical correctness stop Design CMOS clock structure to produce E-field Is environmental quality < Ek? Re-simulate for logical correctness Calculate # of cells allowed per clock window Are defects tolerable? 9 3 2 1 6 5 4 11 10 7 8 12 A roadmap for new work… No No Yes Yes Is there a “race”? Yes Yes No No No Yes Introduce defects into logical layouts (use stats from physical experiments)
1 2 3 1 2 3 4 5 6 5 4 6 We can rearrange nodes to eliminate crosses Input A Input A y x Majority Gate Input B Input B Window of computation Input C Input C 0 (and) 1 (or) xor B A M M M A A xor B B B xor A 0 (and) A “logical” wire crossing XOR: (A and B’) or (A’ and B) (there is an inherent crossing) Using planar XOR made of NAND gates, circuit at left can be built CAD Buildability Constraints Rearrange to eliminate crosses Duplicate to eliminate crosses The building blocks that currently make up our “parts library” are restricted to the DNA-based substrates (Fig. 9a), circuits that use only 1 type of cell (i.e. only 90-degree cells), and circuits that have no wire crossings. A B B A B A B C D C D C D no crossing eliminated buildability constraints met by duplicating a node Logical crossings are also possible… Minimize clock skew Improve circuit density Because of QCA’s clock, only certain # of cells are active (able to compute) at any one time. If it takes too long for a value to propagate, the wrong answer will appear at the output. CAD can address this problem by optimizing for path length – or, as the clock moves from left to right, reducing the vertical height of wires (i.e. length x is shorter than length y). This is the first cut of an ALU; it is much less dense than equivalent designs.
Aout Cout Bout Ay By Cy w1=1 w2=0 w3=1 Ax Cx Bx xin A B C Systolic Architectures… It’s also possible to design a similar circuit without the requirement that all signals will have to arrive simultaneously. This circuit is shown below. This circuit will take longer to process the output. Also, x values will have to be asserted for two clock cycles as opposed to 1. Thus, an input pattern would be x1, x1, x2, x2, x3, x3, … Aout Ay Bout Cout Cy By Yout Yin based on… w3 = 1 w2 = 0 w1 = 1 W xin Ax Bx Xin Xout B C A
d c Systolic Processing (and errors) Sources of error a a b b …. c d e Possible sources of error in systems of molecular QCA cells. Missing cells (a), wrong distance between cells (b), offcenter cells (c), rotated cells (d), and offcenter cells in the “y”-dimension (e). The QCA circuit in terms of logic gates w2(0) w3(1) The top part of this figure shows a DNA tile with four schematic QCA molecules attached to specific sites in the major groove of one DNA helix (a). This DNA tile is one of nine tiles which would form a diamond-shaped raft 60 nm long by 12 nm wide. After ligation to prevent disassembly, six of these rafts would assemble (b) into a functional pattern matching circuit in an area of less than 0.01 square microns. Part (c) shows how the DNA circuit board could self-assemble on a surface with buried clocking wires; the wires are about 25 nm in diameter on a 75 nm pitch. This circuit would be capable of matching a specific string of 1s and 0s to an input stream of 1s and 0s – hardware that could be used in internet search engines to locate items in a database, to find an address in a computer’s memory, etc. xout xout xin xin
Detailed design rules Rule 2B: Disorder How is disorder affected by Ekink? 2B q Ekink ~ (1/r5)(cos4q ). As qincreases, Ekink decreases. r ndisordered = # cells q1 q2 = 0o Ekink ~ (1/r5)(cos2(q1+ q2)). As q1 or q2 increases, Ekink decreases. ndisordered = # cells Why they are important: • Successful binary value transmission dependent on no external energy greater than the smallest kink energy
Me • Name: • Michael Niemier (a.k.a. “Mike”) • Contact: • E-mail: mniemier@cc.gatech.edu • Phone: (404) 894-1704 • My office: 219