1 / 29

External scrubber implementation for the ALICE ITS Readout Unit

This paper discusses the implementation of an external scrubber for the ALICE Inner Tracking System (ITS) Readout Unit, addressing radiation challenges and SEU mitigation in the FPGA design.

caffey
Download Presentation

External scrubber implementation for the ALICE ITS Readout Unit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TWEPP '19, Santiago de Compostela External scrubber implementation for the ALICE ITS Readout Unit Magnus Rentsch Ersdal magnus.ersdal@uib.no TWEPP '19, Santiago de Compostela

  2. University of Bergen Inner Tracking System (ITS) Upgrade Inner barrel half-layers ITS upgrade cutaway

  3. University of Bergen Readout Electronics

  4. University of Bergen Readout Unit

  5. University of Bergen Radiation environment Readout Units Sit here Design for 1 kHz/cm2 ~ 4 orders of magnitude more than normal radiation background Total Ionizing Dose (TID) and Non-Ionizing Energy Loss(NIEL) are such that they pose no concern

  6. Universityof Bergen SEUs and CMOS circuits • Single Event Upsets (SEU) • SEU = LET changing the state of a node (bitflip) • SEUs in configuration cell SRAM

  7. University of Bergen Radiation challenges • SEUs interrupt operations by: • Upsets in configuration memory in SRAM FPGAs (Main concern1) • Upsets in flash memory • Upsets in registers / state-machines • Potentially, a disruption of the clock / reset nets can stop all activity on the FPGA • Some space projects utilize anti-fuse devices, not an option in our case. • There is a potential for single event functional interrupts 1:New Developments in Error Detection and Correction Strategies for Critical Applications, Melanie Berg 2017

  8. University of Bergen Mitigation, generally • In our environment, we can ignore dose effects for our FPGAs because TID will be low enough • Tolerates expected doses • We cannot ignore soft errors • Mitigation techniques are applied to our FPGA designs • Triple Modular Redundancy (TMR) on logic • For protecting against configuration memory SEUs, this is not sufficient1 1:New Developments in Error Detection and Correction Strategies for Critical Applications, Melanie Berg 2017

  9. University of Bergen Readout Unit Additional system components; Flash FPGA, Proasic3 (Pa3) for increased radiation tolerance

  10. Universityof Bergen SEU mitigation for the main FPGA • In FPGA design: TMR (see poster* by M.Lupi) • Scrubbing: • "Scrubbing is the act of simultaneously writing into FPGA configuration memory as the device’s functional logic area is operating with the intent of correcting configuration memory bit errors." 1 • External scrubber that is radiation tolerant • Flash FPGA configuration memory is rad-tolerant 1:New Developments in Error Detection and Correction Strategies for Critical Applications, Melanie Berg 2017 *https://indico.cern.ch/event/799025/contributions/3486415/

  11. University of Bergen Requirements for ExternalScrubber • Initial configurationof Xilinx Ultrascale (XKCU - mainfpga) usingconfigurationstored in on-board flash memory • Scrubbingof XKCU configuration Memory • Configuration and Scrubbingareboth operating ontheSelectMAP bus • Additionalrequirements: • Scrubbing and initial configuration must be «fast enough» • Scrubbingcyclesshould have a significantlyhigherfrequencythan SEU rate, ruleofthumb: 10x (Xilinx application note xapp216*) • Worst case SEU rate: ~0.04 SEU/s per Readout Unit. (8/s for all 192 RUs) • Radiation tolerant • Efficientcontrolinterface • Two I2C interfacesareavailable in hardware • Efficientuploadof files *https://www.xilinx.com/support/documentation/application_notes/xapp216.pdf

  12. University of Bergen Flash FPGA Design

  13. University of Bergen Config and Scrubbing

  14. University of Bergen File upload

  15. University of Bergen Control

  16. University of Bergen Key numbers • Initial config : 2s (197 Mb) • Scrubbing : 1.7s (151 Mb) • Writing to flash memory done via scripts • I2C: ~230 kb/s • SWT* (Xilinx FIFO): ~4 Mb/s • Resource utilization • Logic cells: 79% • RAM: 4 of 24 *Single Word Transaction, the slow-control protocol for the main FPGA

  17. University of Bergen SEU mitigation in the PA3 design • Local TMR on registers • Recommended method for flash-based FPGAs1 • Needs 3x DFFs and some additional logic cells for voting Reproduced from 1 1:New Developments in Error Detection and Correction Strategies for Critical Applications, Melanie Berg 2017

  18. University of Bergen SEU mitigation in the Flash memory • Scenario: writing a faulty configuration bit can theoretically stop the Xilinx FPGA from functioning • 1048/1024bit hamming error correcting codes (ECC), interleaved with data before loading the flash. (python3 sw) • Implementation of TN2908* • Gitlab CI creates and encodes the files on every commit • Single-bit correction, double-bit detection. More than 2 bitflips undefined. • Device has two distinct chips inside the same package. Writing to both in case of critical error on one. *https://www.micron.com/-/media/Documents/Products/Technical%20Note/ NAND%20Flash/tn2908_NAND_hamming_ECC_code.pdf

  19. University of Bergen SEU mitigation in the Flash memory • Based on irradiation campaigns the SEU cross section in the Flash Memory is estimated at: • (0  1) 10-16 cm2/bit • (1  0) 10-21 cm2/bit • A typical scrubbing file has a 1:20 ratio ofOnes vs Zeros • A typical programming file has a 1:50 ratio of Ones vs Zeros • given no default values written to BRAM • Because of this, the bits of the files are inverted before writing these to the flash memory Weste, Harris: CMOS VLSI Design, p.127

  20. University of Bergen SEU mitigation in the Flash memory • Three measures have beenimplemented: • Storing theprogramming file inverted • Adding Hamming encodingofthebitstream • Store twocopiesof all the files in the Flash memory • This gives: P(fatal error) == P(double bitflip in one ECC encodedblock in bothcopiesofthe file) • P(fatal error) = 7E-26 during 10h spill

  21. Universityof Bergen Additional feature for commissioning and design qualification • Fault injection • A tool for tabletop "beam-testing" • To be used for commissioning and design qualification only. • This can be exploited to improve rad tolerance and add design recovery routines.

  22. University of Bergen Fault injection HW top level • Select random number -> count down -> flip bit • 14x faster rate than worst case design SEU rate

  23. University of Bergen PRBS "random" functions • Pseudorandom Binary sequence • Linear Feedback Shift Register (LFSR), 32 bits long • scaled to fit memory layout (4504 pages x 4096 bytes)

  24. University of Bergen Status • Design is verified and tested; all mandatory features of the FPGA design are ready. • Work in progress: • Finalize fault injection • Remote programming of ProASIC3 • Thank you

  25. Universityof Bergen

  26. ITS Plenary Meeting 28th Feb - 1st Mar 2018 Probabilityof fatal error • Combinedcrosssection: • CS1:20 = 4.76E-18 cm2/bit • Probabilityof double bitflip in ECC block flash#0: • P(double#0) ≈ (CS1:20*ECC_size*ECC_blocks)2 = 1.61E-14 • Probabilityof double bitflip in same ECC block flash #1: • P(double#1 | double#0) ≈ P(double#0)/ECC_blocks = 6.33E-22 • CombinedProbability: • P(double#1 ꓵ double#0) = P(double#0) * P (double#1 | double#0) = 1E-35 •  7E-26 double bitflips in same ECC block in both flash ICs during 10h run • Importantnumbers: • ECC blocksize: 1048 bits • # ECC blockson Flash: 2.52E+07 • Est. Flux Run 3: 1 kHz/cm2 • Fluence 10h spill: 3.6E+07 cm-2 • Cross-section (10): 1.0E-21 cm2/bit • Cross-section (01): 1.0E-16 cm2/bit • Ratio 1:0 scrub-file: 1:20

  27. ITS Plenary Meeting 28th Feb - 1st Mar 2018 Resource usage & timing

  28. University of Bergen How random is prbs

More Related