290 likes | 303 Views
This talk summarizes the new developments in the sPHENIX DAQ since 2018 and discusses ongoing discussions with C-AD for beam optimization. It also provides an overview of the sPHENIX detector and the software and computing review.
E N D
DAQ Overview sPHENIX Software & Computing Review Martin L. Purschke Sep 5-6, 2019 BNL sPHENIX Software & Computing Review
Software & Computing Review • Are the resources required to transmit and store the data adequately understood? • Are the resources required to reconstruct the data to a form suitable for physics analyses adequately understood? • Is the plan for developing software process and framework adequately understood? This talk will mainly address point 1 and summarize new DAQ developments since 2018 sPHENIX Software & Computing Review
2018 Recommendations sPHENIX Software & Computing Review
Highlights since the last review • We decided to shift event building to an offline process (under discussion last year) • We had another test beam with a TPC and a MVTX prototype that, for the first time, used the full readout electronics chain for both detectors • Both TPC and MVTX are triggered and clocked with our new timing controller (“vGTM” = virtual Granule Timing Module) • The TPC was read out and is analyzed in streaming mode as it will be in sPHENIX $ dlist rcdaq-00002343-0000.evt -i -- Event 1 Run: 2343 length: 5242872 type: 2 (Streaming Data) 1550500750 Packet 3001 5242864 -1 (sPHENIX Packet) 99 (IDTPCFEEV2) $ sPHENIX Software & Computing Review
Who is Who in the online system sPHENIX Project John Haggerty Project Scientist M. Purschke DAQ Manager E.Mannel Calorimeter Electronics Tom Hemmick TPC Jin Huang TPC DAM/FELIX Takao Sakaguchi TPC Frontend S. Boose EE E. Desmond Software CY Chi EE Joe Mead EE Joe Mead EE J. Kuczewski EE Joe Mead is in NSLS-II J. Kuczewski is in Instrumentation Div. John Haggerty is former PHENIX DAQ coordinator E. Desmond is retiring sPHENIX Software & Computing Review
Q2: Discuss beam optimization with CAD • The functioning of the TPC is impacted by the overall collision rate – the TPC has a 13ms “memory”. That includes collisions outside of the +-10cm center • This is the measured vertex distribution in PHENIX 2016 • We want the inner red (+-10cm) distribution, ideally Actual vertex distribution measured by our ZDCs sPHENIX Software & Computing Review
Status of ongoing discussions with C-AD • Focus on a beam crossing angle to narrow the vertex distribution • For a calculation of overall and desired luminosity as a function of crossing angle: Naïve view of the collison of beam bunches: • Total luminosity • Luminosity within +- 10cm No crossing angle Collisions can happen anywhere along the overlap Small crossing angle The overlap region is reduced sPHENIX Software & Computing Review
Finding the sweet spot • A crossing angle comes with a reduced overall collision rate • sPHENIX will be able to ask CAD to tune the beam as needed • Beam experiments to be conducted to verify the calculations More information: https://indico.bnl.gov/event/5397 sPHENIX Software & Computing Review
sPHENIX Detector Overview Hadronic Calorimeters Electromagnetic Calorimeter Time Projection Chamber (TPC) Intermediate Tracker (INTT) • Minimum Bias Detector (MDB) MicroVertex Detector (MVTX) sPHENIX Software & Computing Review
FEM FEM FEM DCM2 FEE FEE DCM2 FEE DCM2 DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DAQ Architecture Network Switch Buffer Box SEB Buffer Box SEB Buffer Box SEB To the RACF/HPSS Calorimeters, MBD Buffer Box Buffer Box TPC, MVTX, INTT Buffer Box Buffer Box Rack Room The scope of this review starts after the buffer boxes They are the only components interfacing with the tape storage system sPHENIX Software & Computing Review FELIX FELIX FELIX EBDC EBDC EBDC
FEM FEE FEE FEE DCM2 FEM FEM DCM2 DCM2 DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM Hybrid of triggered and streaming readout The calorimeters and the MBD re-use the PHENIX “Data Collection Modules” (v2) Triggered readout Calorimeters, MBD The TPC, the MVTX, and the INTT are read out through the ATLAS “FELIX” card directly into a standard PC TPC, MVTX, INTT Rack Room Streaming readout ATLAS FELIX Card Installed in a PC sPHENIX Software & Computing Review FELIX FELIX FELIX PC PC PC
Streaming Readout + Triggered Events (concept) Chunks correlated with triggered events sPHENIX Software & Computing Review
Q1 Uncertainty for TPC data volume for commissioning/ZS We cannot use the SAMPA in pass-through mode due to E-Link limits. Zero-Suppression is required in all but the lowest trigger rates. Possible commissioning scenarios: • Initial turn on: SAMPA triggered non-ZS mode • SAMPA in triggered mode, for each trigger collect 260 ADC samples, O(100Hz) trigger • Data = 30 MB / TPC event (13 us*20 MHz*160e3 ch*10bit/8e6 MB * 60% compression) • Expect O(1M) events in detector turn on check, negligible total data vol • TPC Time-In Calibration: SAMPA ZS mode w/o DAM trigger throttling • SAMPA in ZS mode, but all data pass through DAM without trigger throttling • Only operate at low collision rate O(kHz) scenarios • Data rate ~ 8.8 Mb * AuAu collision rate * 60% compression, take 1 low-bunch RHIC fill. • Production data: SAMPA ZS mode w/ DAM trigger throttling • Data rate as presented in the last review sPHENIX Software & Computing Review
What is the estimated TPC data rate? • The instantaneous TPC data rate depends on the collision rate • Au+Au TPC data rate [Gbps] ~ 70 + 1 * Collision_kHz • Au+Au TPC event size [MB] ~ 0.54 + 0.0085* Collision_kHz sPHENIX Software & Computing Review
Year 5-average data rates MVTX (MAPS) ~ 20GBit/s Intermediate Silicon Strip Tracker (INTT) ~ 7GBit/s Compact Time Projection Chamber (TPC) ~ 100Gbit/s Calorimeters (primarily Emcal, hadronic cal.) ~ 8GBit/s ____________ 135GBit/s The takeaway: max 1.4 PByte on a good day of Au+Au running Ramp-up in Run-1 starting at about 90GBit/s (estimate) Instantaneous data rates can be significantly higher sPHENIX Software & Computing Review
FEM FEM FEM FEE FEE DCM2 DCM2 DCM2 FEE DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM Data Formats Each Front-End Card contributes what we call a “Packet” to the overall event structures A hitformat field identifies the format of the data, und ultimately selects the decoding algorithm We can change/improve the binary format and assign a new hitformat for a packet at any time Insulation of offline software from changes in the online system Event P P P P P P Rack Room P P P … sPHENIX Software & Computing Review FELIX FELIX FELIX PC PC PC
Offline event building Each SEB and EBDC writes an individual file to storage, containing “sub-events” from the detector in question (hence the name SEB = “Sub-Event Buffer”) This results in about 60 files being written concurrently, about 10 files/bufferbox Events • This eliminates an complicated online event builder component that would need … • … to accommodate the peak data rate • … to work perfectly on Day One • … to perfectly align events all the time • For all detector system prototypes we can already write those individual files • Offline event building gives you a 2nd chance if something goes wrong. “sub-Events” Files SEB Buffer Box SEB Buffer Box SEB Buffer Box EBDC Buffer Box Buffer Box EBDC Buffer Box EBDC A fraction (about 10-50Hz worth) of events to be assembled near-line to verify the proper alignment, and as input for online monitoring sPHENIX Software & Computing Review
A central Role: The Buffer Boxes • The Buffer boxes level the ebb and flow from the front-end • Decay of luminosity during a RHIC store • Gaps in data taking while setting up the next store • Other breaks for access, APEX, machine development The buffer boxes temporarily store the data (about 70-100 hrs capacity) Sending the average rather than the peak rate, to the RACF saves expensive WAN costs Also allow us to ride out short (a day or so) outages of the tape storage system sPHENIX Software & Computing Review
Event and total data rates sPHENIX Software & Computing Review • Run 1: Au+Au: 14.5 weeks ⋅ 60% RHIC uptime ⋅ 60% sPHENIX uptime ⟶ 47 billion events 75 PB • Run 2: p+p, p+A: 22 weeks ⋅ 60% RHIC uptime ⋅ 80% sPHENIX uptime ⟶ 96 billion events 143 PB • Run 3: Au+Au: 22 weeks ⋅ 60% RHIC uptime ⋅ 80% sPHENIX uptime ⟶ 96 billion events 205 PB
sPHENIX site connectivity sPHENIX Bldg 1008 Plenty of unused fibers, more than we ever need for sPHENIX (72 fibers total, 36 pairs) Each pair is 100Gbit-capable Bandwidth in/out is not an issue We envision two bonded 25G interfaces per Buffer Box. Bldg 1005 Fiber Path Bldg 911 Bldg 725Bldg 515 sPHENIX Software & Computing Review
General and Online-Specific Requirements • Offline/Near-line event building. The SEBs and EBDCs write individual data streams each. A fraction (about 10-50Hz worth) of events to be assembled near-line to verify the proper alignment, and as input for online monitoring • Online monitoring. Recognize – • ”big” failures (dead channels, e.g. tripped LV/HV supplies) on a timescale of a minute • More subtle issues (gain drifts, etc) in ~ 30 minutes • “data ok” verification in ~12 hours • Filtering. Make use of the data availability on Buffer Boxes to extract certain event types (periodic laser calibration, etc) for a fast turnaround sPHENIX Software & Computing Review
Milestones • 2021 • Demonstrate the ability to read time-aligned data from multiple systems • Second simulation campaign (with optimized tracker) • Develop/Adopt particle flow algorithms and Optimize Calorimeter Clustering • Database integration of alignment, detector parameters, and calibrations • Deploy space-charge correction framework • Mock data challenge(s) from simulated raw data thru calibration and reconstruction • 2022 • Full chain test and Q&A for all subsystems • Cosmic running and analysis • Readiness testing and commissioning sPHENIX Software & Computing Review • 2019 (Q4) • Demonstrate the ability to read time-aligned data from multiple systems • Deploy 2D & 3D vertex reconstruction • 2020 • Demonstrate the ability to build events offline • First simulation campaign (realistic TPC drift & ExB effects) • Modeling and Simulation of TPC space-charge distortions • Implement ACTS tracking, optimize tracking for 5-sec/ev • Detailed Calorimeter simulation validation • ADC time-series signal fitting optimization • Develop calibration strategy, workflow, simulation tools • Define post-DST data formats • Implement PANDA/RUCIO for distributed job submission and data management • Implement Multi-Threading • Demonstrate the ability to use the OSG
Software-Related Milestones (There more DAQ-centric milestones in the project. This is the one with the most impact on software.) We are working towards a “multiple-system test” where we read out calorimeter prototypes together with a streaming-readout detector (likely a TPC prototype) Demonstrate the ability to read time-aligned data from multiple systems (end of 2019) Demonstrate the ability to build events offline ( ~ in 6 months) sPHENIX Software & Computing Review
Summary The sPHENIX data acquisition system will write max 1.4 PB/day for the largest systems (Au+Au) We are aiming for an event rate of 15KHz and a livetime above 90% There is a triggered readout and a streaming readout component that needs to be aligned and combined offline. sPHENIX Software & Computing Review
Back Up sPHENIX Software & Computing Review
Bufferbox throughput benchmarks • test results of disk write capability of a new buffer box machine • Bufferbox capacity is 102 12 TByte disks • 21 local processes write-only at ~69.1 Gbit/sec/bufferbox • Simultaneous Read/write rated for 40 Gbit/sec sPHENIX Software & Computing Review
Late-stage compression After all data reduction techniques (zero-suppression, bit-packing, etc) are applied, you typically find that your raw data are still compressible to a significant amount Our raw data format that supports a late-stage compression This is what a file would normally look like buffer buffer buffer buffer buffer buffer LZO algorithm Add new buffer hdr New buffer with the compressed one as payload buffer This is what a file then looks like On readback: LZO Unpack Original uncompressed buffer restored buffer sPHENIX Software & Computing Review
Late-stage compression We treat the late-stage compression as a catch-all for the data reduction that we have not yet implemented During commissioning we need some amount of raw data, safety belts, checksums in the data to verify the integrity As time goes on, and we gain experience, the data become more densely packed, and less compressible This can be seen as a figure of merit – the “information density” Some systems in PHENIX went through 7,8 internal format changes over time. sPHENIX Software & Computing Review All this is handled completely in the I/O layer, the higher-level routines just receive a buffer as before.
Data Hierarchy … … … In the Event Builder, we perform a late-stage loss-less compression at the buffer level The compression reduces the size by 25-50% depending on the ”maturity” of the binary formats Mitigates the data size increase from early packet format versions and debug info somewhat sPHENIX Software & Computing Review