340 likes | 542 Views
High Speed Data Acquisition and Trigger. Walter F.J. Müller GSI. Front end / DAQ / Trigger Requirements. The overall architecture of front end electronics, DAQ, and trigger is mainly determined by the trigger requirements Main CBM Triggers: J/ Open charm Low mass di-leptons
E N D
High Speed Data Acquisition and Trigger Walter F.J. Müller GSI CBM Experiment R&D Coordination Meeting
Front end / DAQ / Trigger Requirements • The overall architecture of front end electronics, DAQ, and trigger is mainly determined by the trigger requirements • Main CBM Triggers: • J/ • Open charm • Low mass di-leptons • Long lived resonances (e.g. ) Trigger on 2e: needs PID and tracking Near Threshold Major Challenge Trigger on D pair: - one semi-leptonic decay - D K displaced vertex CBM Experiment R&D Coordination Meeting
DAQ Architecture – Antique Extra trigger signal path Detector Front end Compensates trigger latency Delay Trigger Digitize after trigger Gate Max. latency limited by delay cable length ADC / TDC Transport selected events to archive LAM DAQ CBM Experiment R&D Coordination Meeting
DAQ Architecture – Collider style Detector Digitize each bunch crossing Dead time free Front end ADC f bunch cross Extra trigger data path Compensates L1 trigger latency Often limited size Pipeline L1 Trigger Accept Transports L1 accepted events to L2 trigger / archive DAQ Often fixed max. latency CMS: 4 sec CBM Experiment R&D Coordination Meeting
Triggered Front End Limitations • Sometimes difficult to build proper L0 • E.g. in nuclear decay spectroscopy where a true minimum bias is needed to see delayed decays • Buffering in FE is a strong and difficult to change (and upgrade) constraint • Sometimes difficult to build L1 with a short (and/or fixed) latency, especially when several triggers run in parallel • E.g. PANDA, triggers are on channels, there is no obvious fast selection criterion CBM Experiment R&D Coordination Meeting
DAQ Architecture - Future Self-triggered Data Push Architecture Detector Dead time free Self triggered digitization Front end ADC clock Provides absolute time scale Compensates builder/selector latency Each hit transported as Address/Timestap/Value Buffer memory Practically unlimited size Event builder and selector Use time correlation of hits to define events. Select and archive. Max. latency uncritical Avr. latency relevant CBM Experiment R&D Coordination Meeting
Advantages of SDPA • No special trigger data path • No fixed maximal L1 trigger latency • Leaner, more versatile Front end • Easier upgrade path of Back end • All signals can contribute to all trigger levels (limit is crunch power, not connectivity) • Highest flexibility for trigger development Main reason why PANDA is committed to use such an architecture CBM Experiment R&D Coordination Meeting
Planned Experiments with SDPA Completion ~2008 • AGATA (Advanced Gamma Tracking Array) • 190 Ge detectors – 6780 channels • 300 kHz events/sec @ M=30 • ~ 1 GB/sec into reconstruction farm • BTeV (Charm & Beauty Decays at FNAL) • 2.2*107 pixel + RICH + ECAL + …. • 7.6 MHz bunch crossing • ~ 1 TB/sec into L1 buffer memories • L1 trigger on displaced vertices DC beam Completion ~2007 Bunched beam CBM Experiment R&D Coordination Meeting
108 ISDN lines 5000 * 2 Gbps CBM Challenges • Large primary data rate: 1 TB/sec 107 int/sec * 200 part/int * 50 layers * 10 byte/hit • Large trigger decision rate: 107/sec • Large computing power required: assume ~100 ops/byte ~100 Tops/sec or ~1014 ops/sec 2*109 part/sec Min. Bias Mostly TRD 104 PC’s with 10 GHz ? 1 MW Power ? CBM Experiment R&D Coordination Meeting
SIS 200 completion > Q4 2011 CBM installation > 2010/2011 CBM production / test > 2008/2009 Plan with technology of ~ 2007 SIA forecast for 2007: 0.065 m process (0.13 m today) Logic factor 4 denser and faster So expect for example 10 Gbps serial link speed ~1 GHz FPGA clock speed >100 kLC FPGA CBM Time Scale Added on 14.11.02 Optical and Cu at low cost connectivity speed at acceptable cost Today: ~ $5000 density CBM Experiment R&D Coordination Meeting
Proposal for CBM DAQ Architecture • Inspired (and scaled) from BTeV • Key difference: DC beam • CBM needs explicit event building • Evaluate time correlation, event tag hits • BTeV (et al.) uses implicit event building • Tag with bunch crossing number in front end • Front end inspired by AGATA • Meant to demonstrate feasibility • Don’t take details too serious !!! CBM Experiment R&D Coordination Meeting
Front End Electronics Good enough for ToF ? Detector Low jitter clock ~ 100 MHz Analog front end Sampling ADC Absolute time scale more channels Hit detector Parameter estimate To neighbor cells t, q, … Mux / Cluster finder Data link interface Timing Link interface xy, t, q, q -1, q +1, … DDL TL CBM Experiment R&D Coordination Meeting
Front End Essentials • Where useful and feasible determine time stamp to a fraction of the clock cycle • Helps event building, reduces pile-up • Do cluster finding • Helps to reduce data volume, otherwise dominated by address and time stamp • Enough bandwidth • DDL should never saturate, and there is no trigger to throttle down data flow CBM Experiment R&D Coordination Meeting
CBM Radiation Hardness Requirements I • Assume 107 int/sec and 5*107 sec on-time • Assume 2*107 h cm-21 rad • 1 h cm-2 1.25*1014 h cm-2 6 Mrad • Total dose (TID) based on CDR numbers: 1.5 yr design luminosity From ATLAS Flux per cent. int. Fluence over life time Total dose over life time COTS CMOS fails after ~100 krad CBM Experiment R&D Coordination Meeting
CBM Radiation Hardness Requirements II • Assume 107 int/sec • Assume Single Event Upset cross section SEU = 10-10 cm2 per device • 1 h cm-2 2.5*106 h cm-2 s-1 22 SEU/day • SEU’s per day and FPGA: Measured for Virtex FPGA Flux per cent. int. Flux SEU rate Multiply with # of FPGA to get system rate Mitigation: reconfigure after each spill CBM Experiment R&D Coordination Meeting
Some Assumptions for Back End • 1 TB/sec data over 1024 10 Gbps DDL’s • 1000 FPGA’s with 100 kLC sufficient for local and regional processing • 100 kLC allows ~ 1000 parallel ops • At 1 GHz this gives 1 Tops /sec / FPGA • Or a total of 1 Pops / sec in the system • 1000 DSP/FPGA pairs enough for global L1 processing (average L1 latency: 100 sec) • Put 4 FPGA’s (or DSP/FPGA) on a board • Power consumption should allow that 1000 ops per byte 500 kops/part 200 kcyc/evt CBM Experiment R&D Coordination Meeting
Back End Processing Event tagging DDL’s Use FPGA’s Data processing: Local: clustering Regional: tracklet Active Buffer L1 Farm Sw Use FPGA’s and DSP’s Global tracking L1 trigger Use PC’s Sw To Archive L2 Farm L2 Trigger Raw Formatting CBM Experiment R&D Coordination Meeting
Back End Data Flow DDL’s From FEE 1 TB/sec L1 Switch ~200 GB/sec Active Buffer Neighbor comm. for regional algorithm L1 Farm Sw L2 switch ~10 GB/sec Sw To Archive ~ 1 GB/sec L2 Farm CBM Experiment R&D Coordination Meeting
Nice scheme … • … but how to implement the needed bandwidth for • Near-neighbor communication • For event building at L1 level CBM Experiment R&D Coordination Meeting
Crates and Backplanes 10 Gbps SERDES in CMOS • Trend: use serial point-to-point links • Parallel `shared media’ busses obsolete • Look for serial backplane fabrics • Backplanes: What’s available today/tomorrow ? • PICMG 2.16: C-PCI + dual 1G Ether star • PICMG 2.17: C-PCI + 4*622 Mbps star • PICMG 2.20: C-PCI + 2.5 Gbps mesh • VITA 41 (VXS): VME + 4*10 Gbps dual star • What’s in the pipe ? • ATCA (Advanced Telecommunications Computing Architecture) • Base Interface: dual 1G Ethernet star • Fabric Interface: 8*10 Gbps star or mesh available available 2.16+2.20 announced Infiniband over P0 conn. CBM Experiment R&D Coordination Meeting
Fabric Types Dual Star Full Mesh Sw N N N N N N N N Sw Nodes communicate via switch 2 * n links needed Nodes communicate directly n * (n-1) links needed PICMG 2.16: (cPSB) 2 fabric slots for 24 port switch 18 node slots; 72 Gbps BW PICMG 2.20: (cSMB) 16 slots; full mesh 2.5 Gbps link; 700 Gbps BW CBM Experiment R&D Coordination Meeting
Active Buffer Board DDL DDL DDL DDL L1L FPGA FPGA FPGA FPGA FPGA Mem Mem Mem Mem L2L 4 4 4 4 To 1G Ether dual star backplane To serial mesh backplane Assume cSMB and cPSB available CBM Experiment R&D Coordination Meeting
Active Buffer Crate cSMB: 70 GB/sec internal Bandwidth DDL 64 Gb/sec input 32 Gb/sec duplex L1L cPSB: ~8 GB/sec internal Bandwidth L2L 1-2 Gb/sec output CBM Experiment R&D Coordination Meeting
Event Building – Part I 1. Stage: Collect global or partial timestamp histogram DDL DDL DDL Active Buffer Active Buffer Active Buffer Active Buffer 2. Stage: Peak find 3. Stage: Tag all hits, use detector specific time window • Histogram dataflow is modest • Tagging dataflow almost negligible • Can be used to throttle L1 data flow • L1 routing decided at tagging time • One hit can be part of multiple events !! Runs over mesh and L1 net Prune after tracking CBM Experiment R&D Coordination Meeting
Event Building – Part II Mesh Backpl. Mesh Backpl. 1. Stage: Collect locally via mesh into one buffer board ActiveBuffer Boards ActiveBuffer Boards 2. Stage: Collect globally via L1 links into one processor Reduces number of L1 transfers L1 Sw L1 Sw L1 Sw L1 Sw Is # crates Not # boards Route fixed when event is tagged Allows to factorize L1 switch To farm sector 1 To farm sector 2 To farm sector 3 To farm sector 4 Use 8 * 256 Gbps Avoid 1 * 2 Tbps CBM Experiment R&D Coordination Meeting
Emphasize FPGA or DSP as you wish L1 Processor Board DSP DSP DSP DSP L1L FPGA FPGA FPGA FPGA FPGA Mem Mem Mem Mem L2L 4 4 4 4 Not needed for event parallel algorithms … To 1G Ether dual star backplane To serial mesh backplane Mesh helps to factorize L1 switch CBM Experiment R&D Coordination Meeting
L1 Processor Crate cSMB: 70 GB/sec internal Bandwidth 32 Gb/sec duplex L1L cPSB: ~8 GB/sec internal Bandwidth L2L 1-2 Gb/sec output CBM Experiment R&D Coordination Meeting
Back End Data Flow 1024 links 10 Gbps DDL’s 256 links 10 Gbps 256 links 10 Gbps Can be factorized ! 8(16)switches with 64(32) ports each 10 Gbps per port Active Buffer 256 boards in 16 crates 16-32 links 10G Ether L1 Farm Sw 256 boards in 16 crates 1 or few switches: 48 10G Ether in ~20 10G Ether out Sw 16 links 10G Ether To L2 Farm CBM Experiment R&D Coordination Meeting
Back End Essentials I • Use as few as possible networks … • Detector Data Links • L1 Network • L2 Network • Use as few as possible protocols stacks.. • Light weight (on DDL and L2 net) • Ethernet / IP (on L2 net) • Provide enough bandwidth … • … than a versatile back end can be build from a few building blocks CBM Experiment R&D Coordination Meeting
Back End Essentials II • Split processing into • Local • Regional • Global • Gain density by using most efficient compute platform • FPGA ~ 20 mW per Gops/sec • DSP ~ 1 W per Gops/sec • PC’s ~ 20 W per Gops/sec • High density automatically gives good connectivity in modern backplanes Hit level Cluster, Tracklet Track, Vertex CBM Experiment R&D Coordination Meeting
Conclusions • An SDPA approach seems feasible for CBM, despite the daunting data rates • There is a large synergy potential between CBM, PANDA, and S-FRS • AGATA uses already a SDPA, the rest of the S-FRS community will probably follow sooner or later • PANDA is committed to use SDPA (under the title S-DAQ). • Experiment time scales differ somewhat, but that can also be an opportunity CBM Experiment R&D Coordination Meeting
Conclusions • Central element of such an architecture is the clock and time distribution • Many other details, like link and crate technologies, can and will evolve with time CBM Experiment R&D Coordination Meeting
Main R&D Fronts • Low jitter clock distribution (ToF quality) • Front end: • ASIC’s often needed (density, radhard), but avoid to be too detector specific … • Back end hardware: • Explore serial connection technologies (links, backplanes, switches, protocols) • Standardize, follow standards … • Define a small set of building blocks Enough backbone BW keeps designs simple CBM Experiment R&D Coordination Meeting
Main R&D Fronts • Back end config/firm/software: • Modern hardware is often ‘tool limited’ • Investigate development tools • Develop Parallelized algorithms (essential for using FPGA’s efficiently) • Learn how to efficiently use a mix of • FPGA (with embedded CPU’s) • DSP • PC’s • Handling of: Fault tolerance, Monitoring, Setup and Slow Control,…. CBM Experiment R&D Coordination Meeting