260 likes | 267 Views
Prospect of SCT Readout Replacement with RCE/ATCA. Mike Huffer, Su Dong SCT DAQ Upgrade for High- m May/16/2012. Introduction.
E N D
Prospect of SCT Readout Replacement with RCE/ATCA Mike Huffer, Su Dong SCT DAQ Upgrade for High-m May/16/2012
Introduction • Generic DAQ R&D at SLAC since 2007 with the RCE (Reconfigurable Cluster Element) and CIM (Cluster Interconnect Module) on ATCA platform being adapted to ATLAS DAQ upgrade R&D. • Mature hardware already deployed for other experiments /projects with extensive software infrastructure and support. • Successful applications on existing ATLAS pixel modules, FE-I4 IBL modules with first generation (Gen-I) RCE/CIM. • New generation of hardware (Gen-II) prototypes with full ATLAS DAQ compatibility are near completion for application development. A further production rounds by Sep/2012 is intended to serve broader distribution at any user requests. • Can serve broad range of ATLAS upgrade needs from phase-0 onwards, with forward and backward compatible configuration flexibilities.
Applications of RCE to ATLAS Pixel • Full suite of pixel calibration code already running on FE-I4 for IBL and migrating to standard ATLAS pixel calibration console interface as used for Point-1. Same setup/software running single chip test stand to full system. • Running ATLAS TDAQ software (IPC,IS,OH,ERS etc.) on RCE under RTEMS • Cosmic telescope DAQ, test beam with EUDET and LHCb TimePix telescope (@15Khz). • IBL stave-0 test setup @ SR1 • HSIO interface is common to Strip upgrade test stand DAQ
The Gen-II RCE Hardware Infrastructure • Xilinx Virtex 5 FX70 FPGA with built-in crossbar & user firmware application space • 4 GByte memory • 6 channels of data I/O up to 12.5 Gb/s per channel. • 40 Gb/s Ethernet network output • 128 DSP tiles For slow data inputs such as existing SCT/pixel at 40-80Mbp/s , alternative version can use 30 regular I/O port instead of MGTs.
Cluster On Board (COB) V4 • 4 Data Processing • Modules (DPM) • Dual RCE (6 MGT), or • Single RCE (6 MGT), or • Single RCE (30 slow I/O ports) • 1 Data Transport • Module (DTM) • 1 control RCE • 24 port x 40G switching capacity • 2x40Gb/s Ethernet port • Front TTC interface (FTM) • Rear Transition Module (RTM): User interface via P3 P3 P2 Combined RCE processing (DPM) and Cluster Interconnect (DTM -> Switch + FTM + DTM) implemented on modular mezzanine cards
Ethernet topology in a 14-slot shelf (crate)… The full mesh ATCA backplane: Any slot has 40Gb/s bandwidth with each other slot in the shelf simultaneously. There is an additional independent base network
COB (V4) with Gen-I Mezzanine DTM FTM (to be built) for TTC interface IPMI controller 24-port 10-GE switch 1 of 4 DPMs (with GEN-I RCEs) Power/ J-tag/ I2C 2x40 GE Ethernet PICMG 3.8 P3 with 120 pairs of user I/O 8x12 SNAP12 Full duplex User I/O
COB V5 • Move Ethernet ports from RTM to COB front panel and free up DPM pins. • New version of Fulcrum switch for KR4 40GE Ethernet • Integrating IPMI onto DTM • Split FTM into simpler FTM and Base Interface and relocate. New prototype by July TTC IPMI Ethernet
New RCE Hardware Status (May/14/2012) • To come: • COB V5 • TTC commissioning • GEN-II DPM with firmware • Gen-II S-link and FEE plug-in firmware • Full L1 speed DAQ
SCT Readout & RCE Implementation 92 existing ROD+BOC pairs processing 96+48 RX+TX. Many slow links requiring irreducible large connector panel space • Specific treatments for SCT 1-1 replacement model with COB V5: • DPM uses single FPGA RCE to reduce cost. • DPM with 30 user channels on regular FPGA pins for slow signals. 24 RX direct flow through; TX multiplex (2x) into 6 channels @80Mhz. • RTM with 8+4 SNAP12 ports for 96+48 RX+TX. 4x6 TX @80Mhz will be fan out to 4x12=48 channels using CPLD. This is needed due to P3 connector 120 channel pair limit.
Single RCE Gen-II DPM with slow I/O A single RCE DPM with 26 user I/O channels on regular FPGA pins for slow signal already exist. Used by the Heavy Photon Search experiment at Jlab. Slight modification to regain 4 additional channels no longer needed with COB V5 for Ethernet to RTM.
I/O Configuration • FEE optical communications: • Look promising that modern transceivers can take the roles of the old TX/RX plug-ins through tests for IBL, but cost significant ($200+/transciever (see e.g. IBL RX study with various SNAP12 transceivers: https://indico.cern.ch/getFile.py/access?contribId=51&sessionId=5&resId=0&materialId=slides&confId=176693 ) • BPM TX encoding regularly exercised in pixel/IBL RCE/HSIO system as firmware in HSIO Virtex-4. • S-Link Output • Eliminated dependence on the bulky HOLA card using dedicated FPGA firmware plug-in in RCE. • Much more compact S-Link output in 4x12=48 channels per RTM in SNAP12 format. • Current Gen-I RCE uses 3.1Gb/s PGP protocol as native transmission scheme. 1.28Gb/s S-Link is a down-grade implementation which should be relatively straight forward.
TTC / Clocks & Delays • TTC Distribution: • A distributed model without a central “TIM”, but any COB can take over the central distributor role. • COB can receive TTC via front panel FTM individually or using shelf backplane base interface to distribute/aggregate • COB has standalone TTC simulation/generation for testing. • Official TTCrx firmware for FPGAs dropped into RCE. • Clocks and Delays: • Clocks come from COB (old VME system from BOC to ROD) and less auxiliary components on RTM need external clocks. • RCE FPGAs (Virtex-5) and TX fan out CPLD (MicroSemi Fusion family) all have by channel fine delays at ~150ps precision (existing BOC have 280ps delay precision). • Fine timing delays are already regularly exercised with pixel/IBL calibrations on RCE/HSIO as firmware in HSIO.
SCT 1-1 ROD Replacement Model • Per COB data 3.84Gb/s << capacity 40Gb/s • Per crate data 46Gb/s only needs 36 (out of 48) S-links • Many ways to send data to FTK: a) S-links b) custom fast optical links c) per COB Ethernet d) crate level Ethernet
Advantages of RCE/ATCA Approach • Separated out S-Link formatting for flexible scaling of the and easy evolution to advanced data communication schemes. • More creative use of the data for FTK and HLT and can even out possible access hot spots on the ROSes. • Possibility for the COBs to take over the ROS functionality for more flexible and performant access for the HLT. • The COBs are even more natural for pixel/tracker upgrade readout at GBT rate so that this step of system evolution has a natural forward path. • The work in porting the SCT calibration software onto RCEs will also immediately benefit the tracker upgrade effort which is also migrating to use the HSIO+RCE merged system so that test stand and full system software/firmware become the same to avoid throw away work. • DTM RCE trigger pattern generation utility for DAQ tests. • IPMI utilities for more advanced DCS.
Cost Estimate See backup appendix for component details • Transceivers are significant part of the overall cost • Other solutions e.g. aggregator + 2 crates can be slightly less in cost (by ~$100-200K), at expense of system complexity
Tasks Hardware / System • COB Board and Mezzanine Cards • Output (S-link) RTM • Frontend RTM • Shelf/Rack Infrastructure • Trigger and Timing System Integration • DCS Integration Firmware • S-Link Plug-in • TTC Plug-in • Data FEX Plug-in • Configuration Plug-in
Tasks Software • Initial test stand setup • Initial calibration example on RCE • Calibration code porting from present P1 system • Configuration migration from present system • Core DAQ dataflow • DTM RCE trigger pattern generation for DAQ tests • DAQ software migration • DCS software migration and improvements with IPMI
Schedule Some key milestones: • Jul/1/2012: Fully functional V4 COB+RTM with Gen-I DPM. • Jul/1/2012: SCT RCE/ATCA test stand at CERN. • Sep/1/2012: V5 COB + 30-chan slow I/O DPM. • Oct/1/2012: First SCT calibration on COB. • Nov/1/2012: Commissioned TTC interface. • Dec/1/2012: Gen-II DPM core firmware completion. • Jan/2013: Preliminary Design Review. • Apr/1/2013: S-Link protocol plug-in commissioned. • Jul/1/2013: FEX protocol plug-in commissioned. • Sep/1/2013: Demonstrate chain test with 2+1 COB crate • Oct/1/2013: COB+RTM production. • Feb/1/2014: DAQ demonstration at 40Khz with 2 full crates. • Mar/1/2014: Full suite of calibrations. • Jun/1/2014: DAQ fully commissioned at 100Khz.
Additional Information • Many previous communications e.g.: • Mike Huffer at ACES Mar/2011: https://indico.cern.ch/materialDisplay.py?contribId=43&sessionId=8&materialId=slides&confId=113796 • Mike Huffer at ACES Mar/09 (& sessions of Mar/09 AUW): http://indico.cern.ch/materialDisplay.py?contribId=51&sessionId=25&materialId=slides&confId=47853 • Rainer Bartoldus at ROD workshop Jun/09: http://indico.cern.ch/materialDisplay.py?contribId=16&sessionId=4&materialId=slides&confId=59209 • RCE training workshop at CERN June/09: • http://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=57836 with introductions, instructions and discussions as current source of documentations. • A collaborative R&D open to all: • Shared RCE test stand at CERN (E-mail Rainer to get an account):https://twiki.cern.ch/twiki/bin/view/Atlas/RCEDevelopmentLab • E-Group for communications (open signup): • atlas-highlumi-RCE-development for main announcements. • Everyone is welcome to explore !
Existing VME ROD Readout Example case: IBL readout with old VME RODs: Reproducing existing RODs to live with present bandwidth limitations by deploying large number of boards. Not viable for GBT rate inputs.
Appendix: Cost Estimate Components (COB) • COB Board (PCB + loading) COB components per board: • Fulcrum switch=$500; • 8xSFP+ for 2x40GE=$1000 in the case of standard COB, while a slower single SFP of 4Gb/s=$50 is sufficient for simplified COB; • other components=$200. The large quantity of loaded COB is therefore Standard COB with 2x40GE = $1000+$500+$200+$1000=$2700 Simplified COB with 4 GE = $1000+$500+$200+$50=$1750
Appendix: Cost Estimate Components (DPM) • DPM PCB + loading: DPM components per board: • each FPGA=$400. • Other components=$100. The complete loaded COB costs: Regular COB with 2-RCE DPMs = $2700(COB)+5x($150+2x$400+$100) = $7950; Simplified COB with 1-RCE DPMs = $1750(COB)+5x($150+$400+$100) = $5000.
Appendix: Cost Estimate Components (RTM) • PCB/loading: quote for 8 was $400. Estimate $200 for large quantity production, and add $50 for FEE RTM CPLD mounting. • Regular RTM components per board: • Each AVAGO 12-chan MTP/MPO transceiver (3Gb/s) ~$200 x8 • Other components ~$50 • Simplified RTM components per board: • Each AVAGO 12-chan MTP/MPO transceiver (3Gb/s) ~$200 x12 • Other components (including 2 fanout CPLDs) ~$100 => RTM cost is dominated by the transceivers.