1 / 61

SiGe HBT BiCMOS Field Programmable Gate Arrays for Fast Reconfigurable Computing

Explore SiGe HBT BiCMOS technology in Fielf Programmable Gate Arrays for high-speed computing applications. History, process, logic levels, design, and performance results are discussed. Discover reconfigurable logic with low power.

earlc
Download Presentation

SiGe HBT BiCMOS Field Programmable Gate Arrays for Fast Reconfigurable Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SiGe HBT BiCMOS Field Programmable Gate Arrays for Fast Reconfigurable Computing Bryan S. Goda Rensselaer Polytechnic Institute Troy, New York

  2. Agenda • Introduction • BiCMOS FPGA History • SiGe HBT BiCMOS Process • Current Mode Logic • Xilinx 6200 FPGA Design • Configuration Memory • Performance Results • Conclusions and Future Work

  3. Current Role of SiGe • “More Zip per Chip” • Wireless Phones -> Watch Sized Phone • Direct Broadcast Satellite • Fiber-Optic Lines, Switches, and Routers

  4. Programmable Bipolar Logic • 1983: Fairchild ECL Field Programmable Logic Array • Fuse Based • 4ns Cycle Rate • High Power • Scaling Problems • 1990: Algotronix 1.2uM 256 Cell Configurable Logic Array • fT 6 GHz, 200ps Gate Delay • 4 Transistor Static RAM Memory Cells • ASIC Emulation and Signal Processing • Forerunner of XC6200

  5. Y1 Y2 Y2 Y1 a a a a Vref EN1 EN2 V- US Patent CMOS Switchable 2 Input Multiplexer V+

  6. SiGe Heterojunction Bipolar Transistor • Selectively introduce Ge into the base of a Si BJT • Smaller Base Bandgap increases e- injection, higher Beta (100) • Higher Beta allows more heavily doped base RB (125 Ohm) • Graded Bandgap decrease base transit time fT

  7. SiGe HBT • 50Ghz Process, 100Ghz process within a year (30uA at 50 Ghz) • 5 layers of metal • Used in RPI VLSI Class • co-integrated with CMOS process • can have HBT logic with CMOS memory • low power and high speed

  8. f Curves for Various Emitter Lengths T

  9. SiGe HBT Layout Emitter Base Collector Sub-Collector

  10. Band Diagram Eg,Ge(x=0) Eg,Ge(x=0) Eg,Ge(x=Wb)- Eg,Ge(grade)= =0.031 ev p-SiGe base Drift Field e- EC n+ Si emitter h+ EV n- Si collector Ge Dielectric Constant Si = 11.7 Ge =16.2 SiGe (7.5% Ge)=12.03 p-Si

  11. CML Branch Current vs. Differential DC Voltage

  12. IBM SiGe and CMOS Load Gate Delays on M1, M2, LM

  13. Current Steering Logic Vcc 0 V Fastest Logic Level Limited Drive Capability Level 1 -250 mV -950 mV Inter-block Signal Level Good Fan-Out (10) Level 2 -1.2 V -1.90 V Clock Signal Slowest Level Level 4 Possible Level 3 -2.15 V Vee 4.5 V

  14. Current Steering Logic In SiGe • 13ps Transistor Switching Time (75 Ghz) • 6ps Process Next Year • Small Voltage Swings (250mv) vs 3.3 or 5 V • Less Power • Smaller Swing = Faster • “Steer” Currents, Use Differential Logic • Less Switch Noise • Less Transistors needed, Complement Signal Present • Flip-Flops and Multiplexers Easy to Implement

  15. A B A XOR B 0 0 0 0 1 1 1 0 1 1 1 0 Vcc O V CML XOR Logic Schematic Level 1 0 -0.25 V A XOR B A A XOR B A A B B 1 0 1 1 0 1 1 1 0 A level1 Level 2 -0.95 -1.2V B level 2 Vref 0 0 0 1 1 0 0 1 0 1 0 1 0 1 1 10 Vee -4.5V A XOR B

  16. General FPGA Structure I/O Cell Logic Cell Routing Network Configuration Memory

  17. High Speed FPGA Applications • Real Time Image Processing • Radar • Pattern Recognition • Digital Networks • Mobile Subscriber Equipment • Command Information Systems • High Speed Switching Nodes • Control Systems • Guidance Systems • Reprogrammable Survivability • Satellite Systems

  18. Image Correlation Search Image Desired Image 1. Desired Image is programmed into chip (1 pixel = 1CLB) 2. Load a section of search image 3. If enough pixels match, then turn found bit on 4. Load another section, or reprogram with new desired image

  19. Samples From XC6200 CAD Tools IO Blocks CLBs Pins

  20. FPGA Drawbacks • Slowdown • 200 Mhz Internal Speed down to 30-60 MHz External • Pass Transistor = Low Pass Filter • Limited Bandwidth • Relatively Long Configuration Times (Seconds) • Vender Guarded Information • More Expensive than Comparable ASIC

  21. Pass Transistor Interconnect Modeling 3 M 1 M M 1 3 2 1 4 2 3 On M 4 2 M M 4 (Memory) Pass Transistor Interconnect Equivalent Circuit from Node 3 to Node 2

  22. Field Programmable Gate Arrays (FPGA) • Hierarchy Level Organization (Sea of Gates) • Simple Cells (Configurable Logic Blocks) • 4x4, 16x16, 64x64 groupings • Hierarchy of routing resources at each level • I/O Blocks (external interface)

  23. Design Parameters • Logic Swings Levels • Based on Differential Pair Switching • Current Levels • Redesign of the Configurable Logic Block • Take Advantage of Differential Wiring • What Parts Can be Turned off if not Used? • Supply Levels • How Many Levels of Logic? • Routing Resources • CMOS Voltage Levels • Integrate CMOS into Bipolar Current Tree

  24. VCC 0 V OUT Level 1 0 -0.25V OUT c d a b d b c a S1 S1 S1 S1 Level 2 -0.95 -1.2V S2 S2 Level 3 -1.9 -2.15V Vref Replace with Vee -3.4 V Current Tree with CMOS Routing

  25. Bipolar vs Bipolar/CMOS Current Trees CMOS Bipolar Pulse Width 50ps 60ps 70ps 100ps

  26. 4:1 Multiplexer Level 1 Inputs Level 1 Output Level 1 Output Level 2 Input Level 2 Input Level 3 Input Level 3 Input CMOS Version W/L 5:1

  27. Sample Logic Using Multiplexers X1:= a A and B X2:= b Y2 If a=1 then select Y2 output = b If a=0 then select Y3 output = 0 1 0 Y3 X3:= a X1:= a A OR B Y2 X2:= a If a=1 then select Y2 output = 1 If a=0 then select Y3 output = b 1 0 Y3 X3:= b

  28. Redesign of XC6200 Logic X1:= a • Original XC6200 Design • Have to Track Inversions X2:= b Y2 1 0 Inverted Output Y3 X3:= a X1:= a • Revised Design • Use Differential Pair Logic • Eliminate XC6200 Fast Logic • No Inversion Tracking Y2 X2:=b 1 0 Non-Inverted Output Y3 X3:= a

  29. X1 X2 Y2 CS Multiplexer 1 0 RP Multiplexer C F S D Q Original XC6200 Architecture X3 Y3 Clk Q Clr X1 X2 Y2 CS Multiplexer 1 0 Redesigned Architecture RP Multiplexer C F S D Q X3 Y3 Bipolar with CMOS Routing Clk Q Switchable Clr

  30. 10 Ghz Three CLB Simulation

  31. CLB Layout 4:1 Mux (off switchable) CMOS Control Master/Slave Latch (off switchable) (off switchable) 4:1 Mux High Speed Logic 2:1 Mux CMOS Control Buffer

  32. Sample CLB Test Circuit Vref 8:1 Mux CLB Vref Buffer 8/1 Divide Pad Drivers

  33. Actual Fabricated Test Circuit Pads (110u x 110u)

  34. Outgoing CLB Routing Incoming CLB Routing N S E W N4 S4 E4 W4 X3 N S E W N4 S4 E4 W4 N S E W N4 S4 E4 W4 X1 X2 CLB F

  35. 4x4 Block Boundary Routing N Switches N Switches E Switches E Switches W Switches W Switches S Switches S Switches Length 4 FastLane (4x4) Length 16 Fastlane (16x16) Chip Length Fastlane (64x64) Local Routing Magic Routing

  36. Wout Nout N S W F N E W F Local CLB Routing N S E W N4 S4 E4 W4 N S E F X3 Eout N S E W N4 S4 E4 W4 N S E W N4 S4 E4 W4 X1 X2 CLB • Nearest Neighbor Routing • Output (F) or Local Through S E W F F Sout Example: Route East Signal Through to Next CLB Note: Can’t Route Signal Back to Origin at this Level

  37. Normal CMOS Memory-CML Interface SRAM Bits In Memory Planes CMOS to CML Buffer V V SS SS Data CLB Multiplexer Inputs V REF decode New Configuration V EE V EE

  38. Memory Design Q D Q CLK Clock D Q Q CLK Data Data Word Out Out D Latch M/S 40 Transistors D Latch M/S 18 Transistors RAM Cell 6 Transistors Parallel Load

  39. 3-D Chip Stacking Memory Planes CLBs • Shorter Wires • More CLBs/Area • Optimize Memory

  40. CLB with Routing and RAM (2) CLB Select RAM2 CLB RAM1 MUX MUX MUX MUX Selects

  41. Layout of Configurable Logic Block with 2 sets of RAM RAM 2:1 Mux Circuit Elements: 240 nfets 122 pfets 36 resistors 98 npn1 HBTs 16 npnhb1 HBTs Master/Slave Latch (memory) 8:1Mux (routing) CMOS Selects CLB (logic)

  42. SiGe Performance Circuit Type Buffer CML MUX CLB XOR,AND,OR XOR,AND,OR Propagation Delay 17ps 22-25ps 23-26ps 100ps Power Decreasing Ideas Date Idea Power Consumption/CLB Dec 98 Original CLB 73 mW June 99 CLB Redesign I 34 mW Aug 99 CLB Redesign II 24 mW Dec 99 Widlar Current Mirror with CMOS Control, CMOS Routing 10.8 mW Mar 00 Supply Voltage 4.5 -> 3.3V 7 mW Dec 00* 7HP Process 0.3 mW * Projected Power Levels for 7HP Process: At 50Ghz, 30 uA, 20x+ reduction in power

  43. Multiplexer Performance vs Temperature Normal 250 mV Swing 200 mV Min Swing

  44. Vcc Input Vref Vee Widlar Current Mirror with CMOS Control

  45. XC6200 Design Improvements • Developed at the University of Scotland • Inversion of Signal at Every CLB • Taken care of due to differential pair wiring • No Pass Transistors, Use Multiplexers for Routing • Able to turn off unused parts with CMOS controlled current mirror • No CMOS-CML Conversion circuits needed, CMOS in current trees • Handcrafted, dense layouts • Context Switching

  46. Power Delay Product 1 5HP PDP CMOS High 0.1 PDP CMOS Low PDP BiCMOS uW/gate/Mhz (log scale) 7HP 0.01 8HP 0.001 1998 1999 2000 2001 2002 Year

  47. Data Dependent Switching Differential Logic has Complement Switching In Opposite Direction A A B B C C Slow Transition Bit Line Twisting Could Vary Signals Up to 30% Setup Time Violations A A B B C C Fast Transition

  48. Future Work • Testing • Overall FPGA Architecture • Scaling • Integrate with Other Systems • Projected Graduation May 2001, work to continue at USMA • Power Reduction • 7HP Process

  49. CLB Context Switch Example Pattern1 0001100100 70ps ~ 7.1 GHz Pattern2 1011011100 70ps Select AND OR AND OR 0001100100 1011011100 0001000100 AND 1011111100 OR

More Related