610 likes | 625 Views
Explore SiGe HBT BiCMOS technology in Fielf Programmable Gate Arrays for high-speed computing applications. History, process, logic levels, design, and performance results are discussed. Discover reconfigurable logic with low power.
E N D
SiGe HBT BiCMOS Field Programmable Gate Arrays for Fast Reconfigurable Computing Bryan S. Goda Rensselaer Polytechnic Institute Troy, New York
Agenda • Introduction • BiCMOS FPGA History • SiGe HBT BiCMOS Process • Current Mode Logic • Xilinx 6200 FPGA Design • Configuration Memory • Performance Results • Conclusions and Future Work
Current Role of SiGe • “More Zip per Chip” • Wireless Phones -> Watch Sized Phone • Direct Broadcast Satellite • Fiber-Optic Lines, Switches, and Routers
Programmable Bipolar Logic • 1983: Fairchild ECL Field Programmable Logic Array • Fuse Based • 4ns Cycle Rate • High Power • Scaling Problems • 1990: Algotronix 1.2uM 256 Cell Configurable Logic Array • fT 6 GHz, 200ps Gate Delay • 4 Transistor Static RAM Memory Cells • ASIC Emulation and Signal Processing • Forerunner of XC6200
Y1 Y2 Y2 Y1 a a a a Vref EN1 EN2 V- US Patent CMOS Switchable 2 Input Multiplexer V+
SiGe Heterojunction Bipolar Transistor • Selectively introduce Ge into the base of a Si BJT • Smaller Base Bandgap increases e- injection, higher Beta (100) • Higher Beta allows more heavily doped base RB (125 Ohm) • Graded Bandgap decrease base transit time fT
SiGe HBT • 50Ghz Process, 100Ghz process within a year (30uA at 50 Ghz) • 5 layers of metal • Used in RPI VLSI Class • co-integrated with CMOS process • can have HBT logic with CMOS memory • low power and high speed
f Curves for Various Emitter Lengths T
SiGe HBT Layout Emitter Base Collector Sub-Collector
Band Diagram Eg,Ge(x=0) Eg,Ge(x=0) Eg,Ge(x=Wb)- Eg,Ge(grade)= =0.031 ev p-SiGe base Drift Field e- EC n+ Si emitter h+ EV n- Si collector Ge Dielectric Constant Si = 11.7 Ge =16.2 SiGe (7.5% Ge)=12.03 p-Si
Current Steering Logic Vcc 0 V Fastest Logic Level Limited Drive Capability Level 1 -250 mV -950 mV Inter-block Signal Level Good Fan-Out (10) Level 2 -1.2 V -1.90 V Clock Signal Slowest Level Level 4 Possible Level 3 -2.15 V Vee 4.5 V
Current Steering Logic In SiGe • 13ps Transistor Switching Time (75 Ghz) • 6ps Process Next Year • Small Voltage Swings (250mv) vs 3.3 or 5 V • Less Power • Smaller Swing = Faster • “Steer” Currents, Use Differential Logic • Less Switch Noise • Less Transistors needed, Complement Signal Present • Flip-Flops and Multiplexers Easy to Implement
A B A XOR B 0 0 0 0 1 1 1 0 1 1 1 0 Vcc O V CML XOR Logic Schematic Level 1 0 -0.25 V A XOR B A A XOR B A A B B 1 0 1 1 0 1 1 1 0 A level1 Level 2 -0.95 -1.2V B level 2 Vref 0 0 0 1 1 0 0 1 0 1 0 1 0 1 1 10 Vee -4.5V A XOR B
General FPGA Structure I/O Cell Logic Cell Routing Network Configuration Memory
High Speed FPGA Applications • Real Time Image Processing • Radar • Pattern Recognition • Digital Networks • Mobile Subscriber Equipment • Command Information Systems • High Speed Switching Nodes • Control Systems • Guidance Systems • Reprogrammable Survivability • Satellite Systems
Image Correlation Search Image Desired Image 1. Desired Image is programmed into chip (1 pixel = 1CLB) 2. Load a section of search image 3. If enough pixels match, then turn found bit on 4. Load another section, or reprogram with new desired image
Samples From XC6200 CAD Tools IO Blocks CLBs Pins
FPGA Drawbacks • Slowdown • 200 Mhz Internal Speed down to 30-60 MHz External • Pass Transistor = Low Pass Filter • Limited Bandwidth • Relatively Long Configuration Times (Seconds) • Vender Guarded Information • More Expensive than Comparable ASIC
Pass Transistor Interconnect Modeling 3 M 1 M M 1 3 2 1 4 2 3 On M 4 2 M M 4 (Memory) Pass Transistor Interconnect Equivalent Circuit from Node 3 to Node 2
Field Programmable Gate Arrays (FPGA) • Hierarchy Level Organization (Sea of Gates) • Simple Cells (Configurable Logic Blocks) • 4x4, 16x16, 64x64 groupings • Hierarchy of routing resources at each level • I/O Blocks (external interface)
Design Parameters • Logic Swings Levels • Based on Differential Pair Switching • Current Levels • Redesign of the Configurable Logic Block • Take Advantage of Differential Wiring • What Parts Can be Turned off if not Used? • Supply Levels • How Many Levels of Logic? • Routing Resources • CMOS Voltage Levels • Integrate CMOS into Bipolar Current Tree
VCC 0 V OUT Level 1 0 -0.25V OUT c d a b d b c a S1 S1 S1 S1 Level 2 -0.95 -1.2V S2 S2 Level 3 -1.9 -2.15V Vref Replace with Vee -3.4 V Current Tree with CMOS Routing
Bipolar vs Bipolar/CMOS Current Trees CMOS Bipolar Pulse Width 50ps 60ps 70ps 100ps
4:1 Multiplexer Level 1 Inputs Level 1 Output Level 1 Output Level 2 Input Level 2 Input Level 3 Input Level 3 Input CMOS Version W/L 5:1
Sample Logic Using Multiplexers X1:= a A and B X2:= b Y2 If a=1 then select Y2 output = b If a=0 then select Y3 output = 0 1 0 Y3 X3:= a X1:= a A OR B Y2 X2:= a If a=1 then select Y2 output = 1 If a=0 then select Y3 output = b 1 0 Y3 X3:= b
Redesign of XC6200 Logic X1:= a • Original XC6200 Design • Have to Track Inversions X2:= b Y2 1 0 Inverted Output Y3 X3:= a X1:= a • Revised Design • Use Differential Pair Logic • Eliminate XC6200 Fast Logic • No Inversion Tracking Y2 X2:=b 1 0 Non-Inverted Output Y3 X3:= a
X1 X2 Y2 CS Multiplexer 1 0 RP Multiplexer C F S D Q Original XC6200 Architecture X3 Y3 Clk Q Clr X1 X2 Y2 CS Multiplexer 1 0 Redesigned Architecture RP Multiplexer C F S D Q X3 Y3 Bipolar with CMOS Routing Clk Q Switchable Clr
CLB Layout 4:1 Mux (off switchable) CMOS Control Master/Slave Latch (off switchable) (off switchable) 4:1 Mux High Speed Logic 2:1 Mux CMOS Control Buffer
Sample CLB Test Circuit Vref 8:1 Mux CLB Vref Buffer 8/1 Divide Pad Drivers
Actual Fabricated Test Circuit Pads (110u x 110u)
Outgoing CLB Routing Incoming CLB Routing N S E W N4 S4 E4 W4 X3 N S E W N4 S4 E4 W4 N S E W N4 S4 E4 W4 X1 X2 CLB F
4x4 Block Boundary Routing N Switches N Switches E Switches E Switches W Switches W Switches S Switches S Switches Length 4 FastLane (4x4) Length 16 Fastlane (16x16) Chip Length Fastlane (64x64) Local Routing Magic Routing
Wout Nout N S W F N E W F Local CLB Routing N S E W N4 S4 E4 W4 N S E F X3 Eout N S E W N4 S4 E4 W4 N S E W N4 S4 E4 W4 X1 X2 CLB • Nearest Neighbor Routing • Output (F) or Local Through S E W F F Sout Example: Route East Signal Through to Next CLB Note: Can’t Route Signal Back to Origin at this Level
Normal CMOS Memory-CML Interface SRAM Bits In Memory Planes CMOS to CML Buffer V V SS SS Data CLB Multiplexer Inputs V REF decode New Configuration V EE V EE
Memory Design Q D Q CLK Clock D Q Q CLK Data Data Word Out Out D Latch M/S 40 Transistors D Latch M/S 18 Transistors RAM Cell 6 Transistors Parallel Load
3-D Chip Stacking Memory Planes CLBs • Shorter Wires • More CLBs/Area • Optimize Memory
CLB with Routing and RAM (2) CLB Select RAM2 CLB RAM1 MUX MUX MUX MUX Selects
Layout of Configurable Logic Block with 2 sets of RAM RAM 2:1 Mux Circuit Elements: 240 nfets 122 pfets 36 resistors 98 npn1 HBTs 16 npnhb1 HBTs Master/Slave Latch (memory) 8:1Mux (routing) CMOS Selects CLB (logic)
SiGe Performance Circuit Type Buffer CML MUX CLB XOR,AND,OR XOR,AND,OR Propagation Delay 17ps 22-25ps 23-26ps 100ps Power Decreasing Ideas Date Idea Power Consumption/CLB Dec 98 Original CLB 73 mW June 99 CLB Redesign I 34 mW Aug 99 CLB Redesign II 24 mW Dec 99 Widlar Current Mirror with CMOS Control, CMOS Routing 10.8 mW Mar 00 Supply Voltage 4.5 -> 3.3V 7 mW Dec 00* 7HP Process 0.3 mW * Projected Power Levels for 7HP Process: At 50Ghz, 30 uA, 20x+ reduction in power
Multiplexer Performance vs Temperature Normal 250 mV Swing 200 mV Min Swing
Vcc Input Vref Vee Widlar Current Mirror with CMOS Control
XC6200 Design Improvements • Developed at the University of Scotland • Inversion of Signal at Every CLB • Taken care of due to differential pair wiring • No Pass Transistors, Use Multiplexers for Routing • Able to turn off unused parts with CMOS controlled current mirror • No CMOS-CML Conversion circuits needed, CMOS in current trees • Handcrafted, dense layouts • Context Switching
Power Delay Product 1 5HP PDP CMOS High 0.1 PDP CMOS Low PDP BiCMOS uW/gate/Mhz (log scale) 7HP 0.01 8HP 0.001 1998 1999 2000 2001 2002 Year
Data Dependent Switching Differential Logic has Complement Switching In Opposite Direction A A B B C C Slow Transition Bit Line Twisting Could Vary Signals Up to 30% Setup Time Violations A A B B C C Fast Transition
Future Work • Testing • Overall FPGA Architecture • Scaling • Integrate with Other Systems • Projected Graduation May 2001, work to continue at USMA • Power Reduction • 7HP Process
CLB Context Switch Example Pattern1 0001100100 70ps ~ 7.1 GHz Pattern2 1011011100 70ps Select AND OR AND OR 0001100100 1011011100 0001000100 AND 1011111100 OR