1 / 34

a HL-LHC CMS pixel detector ASIC in 65nm technology

a HL-LHC CMS pixel detector ASIC in 65nm technology. Jorgen Christiansen, CERN/PH-ESE Elia Conti, Perugia. Outline. Driving requirements 65nm for HL-LHC pixel Optimization of pixel regions Global architecture Alternatives Flexibility Planning Summary. Driving pixel ASIC requirements.

najwa
Download Presentation

a HL-LHC CMS pixel detector ASIC in 65nm technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. a HL-LHC CMS pixel detector ASIC in 65nm technology Jorgen Christiansen, CERN/PH-ESE Elia Conti, Perugia

  2. Outline • Driving requirements • 65nm for HL-LHC pixel • Optimization of pixel regions • Global architecture • Alternatives • Flexibility • Planning • Summary

  3. Driving pixel ASIC requirements • Pixel size • Driven by physics and what can be made by technology (Readout chip). • Sensor type: 2D Si, 3D Si, Diamond • Radiation damage • It will probably take quite some time to have a decision on this -> keep flexibility • Connection to sensor. • Capacitance, Bump bonding, Thinning, Polarity, Collected charge, Cluster size, leakage, etc. • Analog performance • Threshold and variation, Noise, ADC/TOT/Bin, etc. • Data (hit) rate, data buffering and triggering • Pixel trigger with ROI ?. • Multiple modes: Triggered, Trigger less, Self triggered, Test, Calibration, etc. • This is what technology can “buy” us with “intelligent pixels”.(if we do not want too small pixels) • Readout • Control • Low power –> Material budget • On-chip/on-module power conversion • System (hybrid) integration • Radiation tolerance (TID, Neutrons, Hadrons, SEU) Hybrid pixel with deep sub-micron (65nm) technology for readout chip.

  4. What we can get from 65nm NMOS Vt shift • Radiation tolerance (TID, Displacement, SEU): • TID has now been “demonstrated” • No need for ELT (Enclosed Layout Transistors) • SEU to be handled with redundant logic where needed • We need: ~10MGy, 1016neu/cm2 (EXTREME !) • Can we use small standard cells ? • To be confirmed • Large amount of digital logic/memory: • Vital for small pixels, high date rates, buffering, flexibility. • Logic density: 250nm: 1, 130nm: ~4x, 65nm: ~16x • For radiation environments 250nm was in fact 2-4x worse as ELT required, so logic density improvement from 250nm to 65nm is 32-64x • Speed: 250nm: ~1, 130nm: ~2x, 65nm: ~4x(Where high speed not needed one can get lower power by running at lower Vdd) • Low power (digital): • Low supply voltage: P ~ Vdd2 • Multiple low power libraries (Vdd, High Vt, ) • 250nm: 1, 130nm: ½ - ¼, 65nm: 1/8 - 1/16 • Many metal (cu) layers: • Power distribution, Signal distribution, columnbusses, etc. NMOS leakage S. Bonacini, P. Valerio CERN TWEPP2011

  5. 65nm CMOS technology • Mature technology (~10 years), available and very well known technology with good technology support (tools, libraries, IPs, MPW, production). Known as a strong technology node that will be available many more years (automotive, industrial , ,) • High yield, accurate simulation models , , , • Not an “exotic” 3D technology: Availability, Density, Yield , Design tools, • (Coarse TSV’s can be used at periphery if: available, appropriate, affordable and sufficient yield) • Analog • Good low noise and low power amplifiers can be made for pixels (small dynamic range, limited linearity) • Improved matching (at same transistor sizes) • Triple well deep implant and high resistivity substrate enabling isolation of critical analog parts from digital within pixels (have shown very good results in 130nm) • Details of this still to be better understood/tested for 65nm. • But we also get: • “High” NRE costs ( ~2 x 130nm for 4x higher density and ~½ power) • Lower supply voltage (but this is what gives low power) • Critical power distribution architecture (Local DC-DC and/or serial powering) • Higher gate and transistor leakage: Can be handled with appropriate design approaches • 12 inch wafers (bump-bonding and testing infrastructure), option of 8 inch ?. • Details of fine pitch bump bonding to be understood/tested (wafer size, UBM , , ,) • Pixel detectors are our IC technology drivers • 65nm technology access, MPW runs, tools, radiation qualification, etc. via CERN (and Europractice). • Or go below 65nm ? : • Cost, Radiation tolerance, ? Gain (power, functions) ?, Do we have time ?, Do we need it ? , ,

  6. How could a 65nm pixel ASIC look like • ~50 um x ~50um Pixels (same area: 25 x 100, 35 x 70)or 50um x 100um ( 2x area) • How small pixels are advantageous in high rate LHC environment ?. • Resolution limited by multiple scattering in beam pipe and pixel detector itself. • Low power pixel ASIC critical ! • Small pixels: Consider binary readout (50um/√12 = ~14um, 35um /√12= ~10um) • “Large pixels”: Use of ADC/TOT for interpolation • Large pixels allow extended buffering and flexibility • Also feasible in a 50x50um pixel cell if appropriate architecture/technology ?. • When/how will pixel size be determined ?. • Major constraint for pixel ASIC design • ASIC design with moving targets is recipe for long design time and mistakes • Programmable pixel size: Dream ?, Function penalties, power penalties, Multiple applications • Pixel array size: > 20 x 20 mm (>200k pixels): • Limited by reticule size ( max 24 x 36 mm ), No stitching (yet) • ~2GHz hits/cm2 (~500MHz/cm2 track rate and cluster size = ~4) • 50 (100)kHz hit rate per pixel -> no fast shaping needed, low power • Digital corrections for analog imperfections ( e.g. , time walk, threshold variation) • Modes: Triggered/Trigger less, TOT/Binary, Testing modes, , • Trigger: Latency <25us (B-ID width: ~10bit) , Rate <1MHz (readout limited) • Contribution to trigger ? • “In-pixel” digital storage and processing (vital for high rate)

  7. 65nm pixel • Analog FE + adjust DAC’s: <~½ of 50x50um2 pixel • First FE/DAC prototypes have been demonstrated (LCD-CERN, ATLAS-LBNL) • ~4 bit TOT amplitude measurement • TOT with basic master clock (40MHz) • Dead time loss: 50khz * 16(max)*25ns = 2% (average ~0.5%) • Shorter dead-time using time interpolation in pixels requiredGet MIPS dead time < 50ns • Or low power SAR ADC (Turino) or Binary with no dead time. • Remaining ~½ for digital. Single pixel can contain the following digital: • 1/3: Flip-flops/registers (1.8 x 3.8um) = ~45 • 1/3 logic: NAND4 (1.8 x 1.4um) = ~125, • 1/3 SRAM (1.05 x 0.5um) = ~600 (in practice much less for small memory) (assuming using small size standard cell library and 75% area utilization) (forgetting about area penalty to separate analog and digital) Marginal to implement buffering, logic, multiple modes, SEU protection, etc. Local Pixel Regions (PR, or super pixels or , ,) to optimize local clustering and enable efficient local digital processing, storage and readout. Preamplifier P. Valerio CERN/LCD

  8. 65nm pixel • Hit storage in pixels / pixel regions • High density column data “busses” to get data out of pixel array. • End of column: Date merging, checking, formatting, compression, Readout. • High speed low power serial link(s) for readout • 4-10 pixel chips feeding one (LP)GBT with local 320 - 640Mbits/s link(or on-chip Gbits/s serializer) • Opto part located ~1m from pixel detector as opto components can most likely not survive in extremely harsh radiation environment. • Full SEU protection of critical parts • Clear identification of critical and non critical parts vital • Power density: <1 W/cm2 • Major design goal determining material for services (power, cooling) • Power conversion on-chip ?, Serial / DC-DC ? • Design complexity will come from low power complex digital with SEUs interleaved with low power, low noise analog • Potential pixel trigger will have a significant impact on the digital architecture. • Required buffering and logic can likely fit in a 65nm chip • System aspects critical (Bandwidth, Links, Architecture, protocol, ,)

  9. Pixel region optimization • Grouping pixels in regions is critical to have enough local digital area for efficient buffering, logic, routing, low power features, different operating modes, , , • HEP pixel hits are naturally clustered • Cluster sizes: • Middle of barrel: ~Square clusters, 1 – max 9 (3x3), Average ~4 (just an example) • End of barrel: Very elongated clusters from track angle Strong dependence on pixel size, sensor, sensor thickness, radiation damage, track angle, detector angle, Lorentz angle from magnetic field, , . • Local clustering enables data reduction (“compression”) • What is “optimum” pixel region organization: • Pixel regions hit per cluster • Static / dynamic pixel regions • Required buffers per pixel/pixel region • Buffer size per pixel region and per pixel • Memory bits required and optimization • Data organization and readout bandwidth Simple statistical “model” to get basic “feeling” Architecture / hits simulation for detailed optimization

  10. Middle barrel • Initial cluster assumption: Central hit with max +1 periphery • Cluster size Distribution Average = 4.2 (just one example) • Pixel Regions (PR) hit per track: • PR = 1 x 1: 4.2 • PR = 2 x 2: 2.6 (ATLAS FEI4) • PR = 2 x 4: 2.2 • PR = 3 x 3: 2.1 • PR = 2 x 8: 2.0 • PR = 4 x 4: 1.8 • PR = 5 x 5: 1.6 • PR = 4 x 8: 1.6 • PR = 8 x 8: 1.4

  11. End of barrel M. Swartz, “A Detailed Simulation of the CMS Pixel Sensor,” CMS Note 2002/027 • Very elongated clusters • Simplified 1-dimensional clusters: 1 – 8 hits • Comparison for different cluster size distributions 100 x 150um2 Approximative formula: n = PR width h = number of hits in cluster

  12. Pixel region and cluster shapes • Cluster shapes: • Square: 3x3 • Elongated: 4x3 • Long linear: 8x1 • Region shapes: • 1x1, 2x2, 4x4, 2x8, 8x2 • Distributions • Single=1, Avg=4.4, Avg=6.6, Max

  13. PRs hit per cluster • For square pixel regions no major variation with cluster shape (when having same number of average hits/cluster) • If non square pixel region the shape must (obviously) follow the shape of the cluster • May not be the case in practice

  14. PRs hit per cluster • Number of pixel regions hit per cluster • Significant reduction from 1x1 to 2x2 • Limited reduction from 2x2 to 4x4 • Above 4x4 ~no reduction

  15. Memory required for latency buffering • Track rate: 500MHz/cm2 (Hit rate: 2GHz/cm2), Latency: 6.4us, Buffer overflow < 10-4 • Having large buffers shared by multiple pixels is much more efficient than independent small dedicated buffers. • But each buffer must be wider: Store date for multiple pixels 4 buffersrequired 19 buffersrequired instead of 4x(8x8)=256!

  16. Relative number of buffers • Reduction in number of buffers is a basic statistical property that does not depend on cluster shape. • Shared buffers more efficient • Plateau reached at 8 (2x4) – 16 (4x4) pixels per pixel region

  17. Memory bits • Less buffers required when pixel region gets bigger • BUT, more memory bits needed per buffer. • 10 bit bunch ID (shared between pixels in pixel region) • 4 bit ADC/TOT per pixel (unless using a dynamic allocation of memory “words”) • Pixel regions should not be larger than ~size of clusters • Optimum reached between 2x2 – 4x4 • Non square pixel region: Pixel region shape must (obviously) be in same direction as cluster

  18. Pixel region optimization • Sufficient logic area to implement buffering and required functions efficiently • Minimize required number of data buffers • Minimize required number of memory bits • Minimize power consumption • Mixed signal layout considerations Not sacrificing performance • Minimize dead-time • Minimize data volume to read out • Not optimize strongly for a certain sensor type and specific application/location • Cluster size and shape • Hit rate • Sensor type and configuration of pixel detector may change several times during the R&D for a complex pixel ASIC.

  19. Pixel region optimization • 2x2 - 4x4 pixel region optimal to minimize buffer memory bits • Pixel Regions of 4 x 4 allows more/better sharing across pixels. • Get sufficient digital resources in each pixel region • ~½ for analog and required adjustment DAC’s • ~½ for synthesized digital (75% utilization): 1/3 registers (~720), 1/3 logic (~2000 NAN4), 1/3 SRAM (<10Kbit)~100k transistors per pixel regionSophisticated local pixel processing/storage !. • Reduced buffering required • Reduced static and dynamic power ?. • Architecture should not be too highly optimized for a particular detector configuration, sensor, location, clustering, etc. • Use technology to build a pixel architecture that is flexible, high rate performance and low power. • Initial assumptions may (will) not be correct. How to organize a mixed signal floor plan for 4x4 ?

  20. Pixel Region 4 x 4: A • Maximize effective area for digital • Use of automated synthesis and P&R • Analog low noise islands • Shielded minimum length low capacitance connections to bump pads. • Analog power distribution • Common biases distribution • Minimize cross talk from digital • Substrate isolation with deep implant • Surround analog islands with quiet logic (configuration, etc.) • Quite logic and shielding below analog distribution signals: Power, biases, global threshold, etc. • Organize digital: • Pixel hit processing and merging • Buffering • Column bus interface • Critical shielding of digital noise to sensor, bump pad and line to bump bad. • Global routing optimization: Analog Power, Analog biases, digital power, timing control, configuration, readout, etc. ,

  21. Pixel region 4 x 4: B • Analog “stripes” • Good for distribution of analog references and analog power • Minimize crosstalk • Digital “stripes” • Good for distribution of timing signals, column busses and digital power. • One regular digital zone for P&R digital • Routing from bump-pad could be problematic • Similar approach considered for LHCb VELO • High rate, trigger less

  22. General 4 x 4 architecture Power Rows: 128 PR’s = 512pixels DAC • Pixels: 4 x 4 x ~128 x ~128 = ~256k (262144) • Chip size = ~50um x 4 x 128 = ~2.6cm x ~3cm (Yield maximization required) • Obviously resembles LHCb/ALICE, FEI4, LHCbVelopix and other high rate pixels • And any other data driven (HEP) chip/system: System on a chip Config DAC Config DAC Region proc. B-ID tag Config Hit Proc. TOT TW comp. Etc. Config. int DAC B-ID Monitoring PR: 4 x 4 Trigger match Control Col.Bus Int. Readout Interface EOC Con. Columns: 128 PR’s = 512pixels

  23. Basic CMS HL-LHC assumptions • Average cluster size: ~4 (worst case ?) • Rate: Worst case HL-LHC (layer locations as in Phase1) • Layer 1 (3.0cm): ~500MHz/cm2 tracks -> ~2GHz/cm2 hits • Layer2 (6.8cm): ~½ of layer 1 • Layer3 (10.2cm): ~½ of layer 2 -> ~¼ of layer 1 • Layer4 (16.0cm) : ~½ of layer 3 -> ~1/10 of layer 1(50MHz/cm2 tracks, 200MHz/cm2 hits) • End-caps ? • Pixel chip: ~6cm2 • Pixel size: ~50x50um2 = 2500um2 (or 25um x 100um) • Or 50um x 100um if better optimized for HL-LHC • Pixel regions: 4 x 4 • Pixels per chip: ~256k (262144) • Tracks/hits per chip per Bx: • Layer 1: 75 tracks per Bx (300hits/Bx) • Layer 4: 7.5 tracks per Bx (30hits/Bx) • L1 Trigger rate: 100kHz (200kHz) • Option of 1MHz ? (see later)

  24. Readout (assuming 4x4 PR) • (LP)GBT user bandwidth: 3.2 Gbits/s (6.4Gbits/s a possible future option) • 10 E-links @ 320Mbits/s • 20 E-links @ 160Mbits/s • (40 E-links @ 80Mbits/s) • (10 E-links @ 640Mbits/s option for future LPGBT) • Event header: 32 bit event header (B-ID , E-ID, ?) • PR Full TOT: 16x4b TOT, 14b PR address • Layer 1: ~1060Mbits/s, 2-4 pixel chips per LPGBT • Layer 4: ~109Mbits/s, ~20 pixel chips per LPGBT • PR Variable TOT per PR: hits x (4bit pixel ID + 4bit TOT), 14b PR address(Data reformatting/compression done in EOC) • Layer 1: ~450Mbits/s, 4-8 pixel chips per LPGBT • Layer 4: ~48Mbits/s, >20 pixel chips per LPGBT • Layer 2,3,4: Pixel module with 2 x 4 pixel ASIC’s and a LPGBT • 1 E-link at 320Mbits/s per pixel ASIC • 2nd. E-link per pixel ASIC for redundancy • Layer 1: Pixel module with 1 x 4 ( Narrow module required for inner layer) • 2 E-links at 320Mbits (or one 640Mbits/s) per pixel ASIC

  25. Pixel hybrid (layer 2 – 4)

  26. Sync Pixel trigger data for L0 • Fast coarse “strip” information • Fast OR along pixel region (or pixel) columns • Tracks per bunch crossing: • Layer1: 75 per chip ! • Layer4: 8 per chip • Clustered “strips” along pixel columns • Pixel region columns: 128b hit map per chip 128b x 40MHz = 5Gbits/s per chip • Useful if ~50% bits set in layer1 ? • Compression not advantageous (layer4: 8 out of 128. 8hits x 7b encoded = 56b) • Pixel columns: 512b hit map512b x 40MHz = 20Gbits/s per chip !. • Double pixel layer with Pt cut seems difficult • Feasible to put logic within pixel chip but very hard to get data out and make a viable system

  27. Pixel trigger with ROI • L0 trigger from calo and muon within ~3us (or more). • L0 rate: 1MHz ? (assuming x10 higher rate than now) • ROI percentage: 10 % ? At ASIC level (just a guess)ROI rate: 1MHz x 0.1: 100KHz per pixel ASIC. • L1 latency: <100us (same buffer depth for L0 and L1) • ROI data: Single pixel address per track: 9 + 9 bit. • Local clustering without considering TOT. • Local logic within pixel array. • Pixel ROI data: Relative low data rate per pixel ASIC • Layer 1: 1MHz x 0.1 x (75 tracks x (9+9) + 16) = ~140Mbits/s • Layer 4: 1MHz x 0.1 x (75/10 tracks x (9+9) + 16) = ~15Mbits/s • Buffering • L0 in pixel cell • L1 in pixel cell (or EOC?) • Shared (or separate ?) buffers • Colum data busses: Shared or separate between ROI and readout ? • Readout: Shared or separated ?. • Separated: Allocate bandwidth as required by two paths • 4 links with programmable speed ( 80/160/320 Mbits/s) and programmable use (L0 ROI data or L1 readout data)

  28. Pixel trigger and readout data flow

  29. Option: trigger 1MHZ, ? x 10 us • Globally different readout/trigger option for whole experiment • All front-end electronics to be changed for all detectors.This is a serious upgrade !. • Pixel estimates • PR region buffering: Scaling with latency • ~100bits required per pixel per 10us • Limit with 65nm assuming 1/3 of digital pixel region area occupied by memories (without TRM): 10Kbits/16/100 x 10us = ~60us (possibly 100us)(assumed 50um x 50um pixels) • Readout bandwidth: Scaling with trigger rate • Layer1: ~4.5Gbits/s per pixel ASIC -> 1LPGBT per ASICPut serializer on pixel ASIC. • Layer4: ~480Mbits/s per pixel ASIC -> 1LPGBT per 4-8 ASIC • Track rate (500MHz/cm2), Cluster size (4): Reduction ?Defined very conservatively in this study. • Data reduction/compression 50% ? (TOT bits, Header, Better on-chip clustering, On-chip cluster center interpolation, ?) • More intelligent EOC logic

  30. Flexibilities • Sensor, pixel size, signal, polarity, leakage, , • 25ns or 50ns collision spacing. • Triggered and non triggered modes • Digital corrections (timewalk, etc.) • Latency: 10 - 100us • Trigger rate: 100k or 1M • ROI mode • Test and calibration features • Scalable for different readout data rates • SEU and fault tolerance • Etc. • How many of these can we keep for how long ? • At least until specs ~clear

  31. Optional pixel sizes with same chip 50um x 50um ASIC • ~50x50um pixel area seems feasible in 65nm • Compatible with available staggered bump-bonding • (55 x 55 to be compatible with medi/timepix sensors ?) • Flexible pixel region configuration can support multiple sensor configurations • 50 x 50 (100%), 25 x 100 (100%) • 50 x 100u (50%) • 100 x 100 (25%), 50 x 200 (25%), 100 x 200 (12.5%) • Effective pixel capacitance dominated by bump bonding, pad, shielding, re-routing and input transistor capacitance. • No major effect on pre-amp noise • Pre-amp current can be increased for larger pixels • Unused channels obviously powered off. • Will inevitably not be the best possible optimization for detectors using only a part of the available channels/resources. • Power • Unused silicon area • Bump-bonding pattern • Considering the cost ( ~2M) and efforts ( ~20 MY) required to develop/test/qualify /produce such a pixel ASIC this can be an acceptable compromise. Total budget: ~5M incl. manpower • Can some “lost” silicon area be recovered ?: • Analog: Difficult • Power off unused analog channels • Multiple thresholds per pixel ? (no gain when having TOT) • Can one parallel pre-amps to get lower noise for high capacitance? • Digital: • Power off parts of digital ? • Buffering: Increased depth as full width not required ? • Increase TOT dynamic range when using less pixel channels 25 x 100 sensor 50x 50 sensor 50 x 100 sensor 100 x 100 sensor 50 x 200 sensor 200 x 200 sensor

  32. Work to be done for digital part • Architecture • High level System Verilog model for architecture optimization • Pixel regions • Latency buffer architecture • ROI extraction • Triggered / non triggered / other modes • Column bus bandwidth/structure • Etc. • Stimulatinghit patterns: • Constrained random hits (within system Verilog), • Full Monte Carlo simulations, • Scaled hits from current detectors, • Provoking cases • To be used for design verification framework for implementation • Detailed Implementation: Pixel region logic • Logic and circuit level • Area and power optimization • Logic type: • Static , dynamic • Synchronous , gated clocks, asynchronous • Buffers: Latch, RAM • Column busses • Clock distribution (significant part of power consumption) • SEU/yield optimization • Etc. • EOC: Readout, Control

  33. Time schedule • Lots of work to be done to have chip ready for ~2020 upgrade • Final chip must be produced/tested: ~2018 • Final chip prototype working: ~2017 • Full first prototype: ~2016 • Small scale prototype: ~2015 • Basic test structures: ~2014 • Radiation qualification of technology: ~2014 • Pixel chip for sensor development/test ? • Dedicated 65nm chip ? • LHCb vertex pixel chip (55x55um2) • Timepix3 (55x55um2) • Other ?

  34. Summary • 65nm seems very promising for high rate flexible pixel readout chip. • Pixel regions minimizes required digital resources and optimizes mixed signal floor-plan • Keep flexibilities/modes as long as possible. • Important global/local optimizations to be made: • Digital: Architecture (e.g. PR), logic, clocking, circuit, etc. • Analog: Preamp, disc, etc. • System: Latency, trigger rate, ROI, readout, etc. • 2020 is not so far away !

More Related