390 likes | 588 Views
FE-I4 Chip Design. Marlon Barbero, Bonn University Vertex 2009, Putten , The Netherlands Feb. 17 th 2009. Contents. IBL (sensor damage) / sLHC (increased luminosity). From FE-I3, current ATLAS pixel FE, to FE-I4, new ATLAS pixel FE for IBL & sLHC. Analog pixel.
E N D
FE-I4 Chip Design Marlon Barbero, Bonn University Vertex 2009, Putten, The Netherlands Feb. 17th 2009
Contents • IBL (sensor damage) / sLHC (increased luminosity). From FE-I3, current ATLAS pixel FE, to FE-I4, new ATLAS pixel FE for IBL & sLHC. • Analog pixel. • Digital pixel and digital Double-Column. • Periphery. • Yield. • Conclusion and milestones.
FE-I4 for IBL & sLHC • sLHC tentative layout (>2017): 4 to 5 pixel layers, small radii / large(r) radii(note: Discussion on boundary pixel / short strips, …). • IBL (~2014): inserted layer @ 3.7cm in current pixel detector. ATLAS present Inner Detector (ID) FE-I4 Present beam pipe & B-Layer tentative ID layout for sLHC? Existing B-layer • - All Silicon. • Long Strips/ Short Strips / Pixels. • Pixels: • 2 or 3 fixed layers at ‘large’ radii (large area at 16 / 20 / 25 cms?) • 2 removable layers at ‘small’ radii New beam pipe r~37 mm Tobias Flick’s IBL mounted on beam pipe
Motivation for Redesign of FE FE-I3FE-I4 • Need for a new FE? • Smaller b-layer radius + potential luminosity increase • higher hit rate. • FE-I3 column-drain architecture saturated. • FE-I4 new digital architecture: local regional memories, stop moving hits around (unless RO). • FE-I4 has smaller pixel (reduced cross-section). • New technology: Higher integration density for digital circuits, rad-hard, availibility. 100 Inefficiency [%] sLHC 80 IBL 60 40 FE-I3 at r=3.7 cm! LHC 20 0 Hit prob. / DC The “inefficiency wall” 0.25 μm130 nm
Future FE-I4-Based Module (& Consequences for FE-I4) Flex • Increased active area: from less than 75 % to ~90 %: Reduced periphery; bigger IC; cost down for sLHC (main driver is flip-chip costs per chip). • No MCC: More digital functionality in the IC. • Power: Analog design for reduced currents; decrease of digital activity (digital logic sharing for neighbor pixels); new powering concepts. 8 metal layers [2 thick Alu.] power routing. 4 4 Sensor 3 1 2 FE-Chip 3 1 2 5 1) Big chip (periphery on one side of module). 2) Reduce size of periphery (2.8 mm2 mm). 3) Thin down FE chips (190 μm90 μm). 4) Thin down the sensor (250 μm 200 μm)? 5) Less cables (powering scheme)? challenging: power (routing, start-up), clk. distrib., simulation / management, yield
Some Target Specs for FE-I4 • Rad.-hardness: >200 MRad ionizing dose (FE-I3: >50 Mrad). • Minimal guidelines: no ELT, NMOS guard rings for analog & sensitive digital circuitry. • ToT coded 4 bits. • DC leakage current tolerant to > 100 nA. biggest in HEP to date analog / digital power tuned for IBL occupancy
Analog Pixel • In FE-I4_proto1 (FE-I4 prototype submitted in 2008): • 2-stage architecture optimized for low power, low noise, fast rise time. regul. casc. preamp. nmos input. folded casc. 2nd stage pmos input. Additional gain, Cc/Cf2~6. 2nd stage decoupled from leakage related DC potential shift. Cf1~17fF (~4 MIPs dyn. range). • 12b configuration: FDAC: tuning feedback current. TDAC: tuning of discriminator threshold. Local charge injection circuitry. TDAC Amp2 50 mm discri Preamp Config Logic FDAC 150 mm
Analog Pixel: Noise & Irradiation ~200 Mrad • Excellent un-tuned threshold dispersion. • Dose received: ~200 Mrad Low I (10 μA): noise increases ~20 %. (17μA) (12μA) (loaded ~400fF) (loaded ~400fF) m~65 e- m~90 e- m~65 e- m~100 e- Low current (which is target value) is 10 μA/pixel for preamp+amp2+comparator Current for minimum noise: 17 μA/pixel
Digital Pixel: Regional Architecture 4-Pixel Unit Digital Region Read & Trigger Token disc. top left disc. top right hit proc.: TS/sm/big/ToT 5 ToT memory /pixel disc. bot. left disc. bot. right L1T Read Neighbor 5 latency counter / region low traffic on DC bus local storage • Store hits locally in region until L1T. • Only 0.25% of pixel hits are shipped to EoC DC bus traffic “low”. • Each pixel is tied to its neighbors -time info- (clustered nature of real hits). Small hits are close to large hits! To record small hits, use space instead of time. Handle on TW. • Consequences: • physics simulation shows efficient architecture. • lowers digital power consumption. • spatial association of digital hit to recover lower analog performance.
Performance / Efficiency IBL: charge sharing in Z comparable with r/phi. Regional Buffer Overflow η=0 0.6% @ IBL rate, pile-up inefficiency is the dominant source of inefficiency • Inefficiency: • Pile-up inefficiency (related to pixel x-section and return to baseline behavior of analog pixel) ~ 0.5%. • Regional Buffer overflow ~0.05% • Inefficiency under control for IBL occupancy Mean ToT = 4
Digital Column Architecture • 168 regions + CLK + buffering scheme 1 digital DC
Digital Power Drop on Vdd • Digital power: • at IBL occupancy, • digital power < 10μW/pixel. 4-pixel region for 21 regions <7mV
Pixel Layout 250 mm TDAC Amp2 50 mm synthezised digital region (1/4th ) discri Preamp FDAC Config Logic Note: Digital ground tied to substrate, mixed signal environment BUT digital region placed in “T3” deep n-well.
FE-I4 Periphery Pix Array: 80×336 pixel array L1T, token, read, … pixel config token token 28 b × 40 DC EoC EoC EoC L1T, token, read, … data formatting / compression config. monitoring Periphery: Asynch. FIFO (hamming code) PLL (40MHz 160MHz), high speed serializer… The pixel array + End of Column logic (triple redundant token) pixel config Powering: DC-DC converters, Shunt-LDO readout FIFO Data formatting block digital ctrl block Command decoder: configuration, reset & L1T 8b10b encoder unit LVDS output, 160 Mb/s, suited for IBL occupancy trigger FIFO digital ctrl block L1T 160MHz global config PLL, 40MHz in, 160MHz out global register bank interface clk select 40MHz ‘LVDS’-out 160Mb/s Powering aux 2
Yield • Estimated from: • Small analog test chips. • 8 fully tested wafers of Medipix 3 ICs, assuming same defect density for synthesized logic. • Expect of order ~39% digitally perfect chips. • Yield enhancement: • Triple redundant read tokens. • Hamming coded pixel data and address (w. minimal # of gates). • Redundant configuration shift register. • Fully functional chips yield might be as high as 76%. (with isolated dead pixels at level <0.1%).
Test Chip Submission FE-I4-P1 3mm LDORegulator SEU test IC 61x14 array ShuLDO+trist LVDS/LDO/10b-DAC Control Block ChargePump 4-LVDS Rx/Tx CapacitanceMeasurement turboPLL: PLL core + PRBS + 8b10b coder + LVDS driv low power discri. 4mm DACs CurrentReference
FE-I4 Collaboration • Meeting 1 time a week. • Collaborate remotely using Cliosoft platform. • Participating institutes: Bonn: D. Arutinov, M. Barbero, T. Hemperek, A. Kruth, M. Karagounis. CPPM: D. Fougeron, M. Menouni. Genova: R. Beccherle, G. Darbo. LBNL: S. Dube, D. Elledge, M. Garcia-Sciveres, D. Gnani, A. Mekkaoui. Nikhef: V. Gromov, R. Kluit, J.D. Schipper FE-I3 18 160 FE-I4 Schedule: Submission planned for November 2009 (with submission readiness review 3-4 Nov. 2009)
backup BACKUP SLIDES
Existing B-layer Newbeam pipe IBL mounted on beam pipe Present beam pipe through present B-Layer
Preamp. & Leakage Compensation I_leakage compensation • Regulated cascode preamp. high gain. less crosstalk path through biasing voltage. • Triple well NMOS input. shield from substrate noise. • Feedback capacitor discharged by NMOS feedback transistor. • Leakage current compensation scheme based on differential amplifier. Vout[V] 2ke<Qin<22ke Vout[V] 250m 100nA leakage 10mV DC shift 380m 225m 300m 200m t[s] 0.1m 0.5m Cst I feedback 200m t[s] 0.1m 0.5m
2nd stage & Comparator 2nd stage amplif. • PMOS input folded regulated cascode (straight cascode in futur?). • Negative going output. • Classic 2-stage comparator. ENC[e-] 150 ENC=160e- @ Cd=0.4p & Il=100nA tLE[s] Il=100nA 100 20n Discriminator Il=0nA 60 10n 100f 200f 300f Cd[F] 20ns timewalk for 2ke- < Qin < 52ke- & threshold @1.5ke- 0 Qin[C] 10k 20k 30k 40k
Clock Distribution • Physical implementation • Differential Logic. • Single Ended. • Possible structures • simple • H-tree • with skew balancing 22
Clock Multiplier I/O choices for ATLAS IBL, ATLAS Pixel System Design Task Force • For IBL, need to transmit data out at BW of 160Mb/s • 2 options: • send a 80MHz CLK to the FE and use both edges to transmit • Needs modification of BOC / ROD to produce higher speed TTC • Needs synchronization protocol on the FE between 80MHz clock & beam crossing. • A new DORIC needs to decode CLK at twice frequency • send a 40MHz CLK to the FE and multiply clock on FE • Needs a clock multiplier on chip • Note: synergy with what the strip MCC need • In FE-I4, we will provide both options: • Clock multiplier from the 40MHz input clock • AUX: possibility to send the 80MHz to the FE
8b10b encoder I/O choices for ATLAS IBL, ATLAS Pixel System Design Task Force • For IBL, need to transmit data out at BW of 160Mb/s • At BOC/ROD: • Data rate 4 times the clock rate • Phase adjustment • Use Clock Data Recovery mechanism • CDR requires an output data stream with good engineering properties • 8b10b: • adequate for this purpose, enough transitions for reliable CDR • widely used easy to implement • provides some level of error detection • provides comma for frame identification & synchronization
SEU-hardened latch • CPPM has studied the influence of various layout of a DICE latch on the SEU x-section. Physical separation of sensitive node pairs. Latch5.1 and latch5.2 ; Area :12µm × 4µm = 48 μm2 nMos separation : 7µm ; pMos separation : 3 µm Triple Redundant Logic with Interleaved Layout. Calin et al, IEEE Trans. Nucl. Sci. vol43, n.6, 1996 • X-section [cm2.bit-1]: • Standard Latch: ~ 5.10-14 • DICE w. improved layout: ~ 3.10-16 1.a 2.a 3.a 1.a 2.a 3.a 1.b 2.b 3.b 1.b 2.b 3.b X-section : < 1.10-17
PLL Overview Voltage Controlled Oscillator Charge Pump Loop Filter Phase Frequency Detector 640 MHz 40 MHz Frequency Divider Conversion and Buffering
Upset detection II 3 pC upset charge in fast divider Feedback too fast signal Feedback too slow signal
Settling Behaviour Control Voltage 3 pC upset charge in fast divider Frequency of single-ended 40 MHz clock Frequency of single-ended 640 MHz clock
3×LHC / b-layer replacement FE-I3, 50μm×400μm. FE-I4 simul., 50μm×250μm. r [mm] η=1.0 η=0.1 η=0.2 η=0.3 η=0.4 η=0.5 η=0.6 η=0.7 η=0.8 η=0.9 200 η=1.2 160 1.41 1.24 1.26 1.26 1.37 1.34 1.33 122.5 120 rates given in [pixel hits.bx-1cm-2] η=1.5 2.55 2.56 2.54 2.55 2.64 2.65 2.64 88.5 80 η=2.0 6.30 6.46 6.03 5.85 5.91 6.46 6.11 50.5 η=2.5 40 12.10 11.53 12.01 11.85 11.72 12.11 8.02 37 η=3.0 η=3.5 z [mm] 0 600 0 100 200 300 400 500
Pixel occupancy Data bandwidth • Pixel hit rate FE output bandwidth: • # bits / pixel transmitted? • address 7+9 bits, analog info 4+2 bits 22b? • data output protocol? • Reduce data output by taking into account clustered nature of real physics hits. NUMBER OF PIXELS 3xLHC FE-I4, central module, 3.7cm layer 3xLHC 10xLHC FE-I4, central module, 3.7cm layer FE-I4, central module, 21cm layer
Pixel occupancy Data bandwidth preliminary assumption: 100kHz L1T, 336×80 pixels FE-I4 • Example 3: clustered data out with fixed format. • compression factor (all at 3×LHC) 3.7cm (vs. 21cm), η=0 • indiv pixels: 4.09 (0.25)×(7+9+4+2)= 1.00 (1.00) A.U. • static 1×2: 3.45 (0.18)×(7+8+2×4+2)=0.96 (0.83) A.U. • dynamic 1×2: 3.02 (0.15)×(7+9+2×4+2)= 0.87(0.74) A.U. • static 1×4: 2.86 (0.17)×(6+8+4×4+4)=1.08(1.08) A.U. • dyn. in-DC 1×4: 2.43 (0.15)×(6+9+4×4+4)= 0.95(0.95) A.U. • dynamic 1×4: 2.13 (0.14)×(7+9+4×4+4)= 0.85(0.94) A.U. (×336) column NL row 106.count.FE-1.s-1 row ToT DC (×40) Disclaimer: no header, trailer, DC-balancing, error correction…
Shunt Regulator (FE-I3 approach) • Shunt regulator generates a constant output voltage out of the current supply • current that is not drawn by the load is shunted by transistor M1 • Very steep voltage to current characteristic • Mismatch & process variation will lead to different Vref and Vout potentials • Most of the shunt current will flow to the regulator with lowest Vout potential • Potential risk of device break down at turn on • Using an input series resistor reduces the slope of the voltage to current characteristic I=f(V) • RSLOPE helps distributing the shunt current between the parallel placed regulators • RSLOPE does not contribute to the regulation and consumes additional power without resistor Iin[mA] with resistor 750 500 250 0 Vout[V] Slide 32 0 0.5 1 1.5 2
LDO Regulator with Shunt Transistor (ShuLDO) Simplified Schematic • Combination of LDO and shunt transistor • M4 shunts the current not drawn by the load • Fraction of M1 current is mirrored & drained into M5 • Amplifier A2 & M3 improve mirroring accuracy • Ref. current defined by resistor R3 & drained into M6 • Comparison of M5 and ref. current leads to constant current flow in M1 • Ref. current depends on voltage drop VIin which again depends on supply current Iin • „Shunt-LDO“ regulators having completely different output voltages can be placed in parallel without any problem regarding mismatch & shunt current distribution • Resistor R3 mismatch will lead to some variation of shunt current (10-20%) • „Shunt-LDO“ can cope with an increased supply current if one FE-I4 does not contribute to shunt current e.g. disconnected wirebond ref current goes up • Can be used as an ordinary LDO when shunt is disabled Slide 33
Parallel Regulator Operation Simulation Vout=1.5 Vout=1.2 • 2 regulators placed in parallel with Vout1=1.2 and Vout2=1.5 • Output voltages settle at different potentials • Current flowing through the regulator stays the same
LVDS Driver • Standard LVDS architecture for 320MHz clock rate and adapted to low supply voltage 1.5-1.2V • Current is routed to/from the output by four switches M3-M6 • Common-mode voltage is measured by a resistive divider and controlled by a common- mode feedback circuit • Output current is switchable between 0.6 -3mA Boni et al., IEEE JSSC VOL. 36 NO. 4, 2001
LVDS Receiver • parallel PMOS/NMOS comparator input stages allow operation in a wide range of common-mode voltages • cross coupled positive feedback structure allows introduction of hysteresis • second stage sums singals from both input stage and converts them to a CMOS output signal Tyhach et al., A 90-nm FPGA I/O Buffer Design With 1.6-Gb/s Data Rate for Source-Synchronous System and 300-MHz Clock Rate for External Memory Interface, IEEE JSSC, Sept. 2005
LVDS transciever IBM 130nm • For IBL and outer layers sLHC, need for a 320Mb.s-1 BW/ LVDS i/0. • LVDS transciever IC irradiated up to ~180Mrad. No degradation observed. 1.8mm tests with differential probe and 100 Ω on board term. @ 1.2V supply TX output Chained RxTx output @ 320 MHz Clock Clock-Rate 1050mV 320MHz 600mV 160MHz 150mV 40MHz Common Mode Voltage 0.8mm
Output data protocol • 8b10b frame: SOF (K.28.7), followed by 24 bits record word(s) and an EOF (K.28.5) + idle (K.28.1) Table: possible data words
Future Directions • Still higher rate capability • Need smaller, faster pixel • Yet need more memory per pixel to buffer higher rate • Two directions to explore: 3D and 65nm • FE-I4 region placed on 2 130nm tiers would have 60% pixel size and 50% more logic/memory FE-I3 pixel -- 3.2 bits FE-I4 -- 25 bits Goal ~35 bits