420 likes | 517 Views
Level-3 trigger for ALICE. Bergen Frankfurt Heidelberg Oslo. Assumptions. Need for an online rudimentary event reconstruction for monitoring Detector readout rate (i.e. TPC) >> DAQ bandwidth mass storage bandwidth
E N D
Level-3 trigger forALICE Bergen Frankfurt Heidelberg Oslo
Assumptions • Need for an online rudimentary event reconstruction for monitoring • Detector readout rate (i.e. TPC) >> DAQ bandwidth mass storage bandwidth • Some physics observables require running detectors at maximum rate (e.g. quarkonium spectroscopy:TPC/TRD dielectrons; jets in p+p: TPC tracking) • Online combination of different detectors can increase selectivity of triggers (e.g. jet quenching: PHOS/TPC high-pT - jet events)
Data volume and event rate bandwidth TPC detector data volume = 300 Mbyte/event data rate = 200 Hz 60 Gbyte/sec front-end electronics 15 Gbyte/sec Level-3 system < 2 Gbyte/sec DAQ – event building < 1.2 Gbyte/sec permanent storage system
Readout in ALICE for heavy ion running Data Rates produced by ALICE Detectors The data size here is based on zero suppressed raw data readout. One ALICE HI year is 106 seconds beam Min.bias sizes are assumed as about 25% of central. The TPC dominates everything, followed by the TRD Need to reduce data volume on tape ALICE Trigger and readout scenarios for HI running. Pb+Pb central trigger is 180 Hz, highly central 55 Hz
Dielectrons • Dielectron measurement in TRD/TPC/ITS • quarkonium spectroscopy needs high rates • TPC must operate at >100 Hz • TPC data rate has to be significantly reduced • TRD pre-trigger for TPC • level-3 trigger system for TPC • partial readout • e+e—verification: event rejection level-3 trigger system TPC @ 200 Hz TRD @ 2kHz Online track reconstruction: 1) selection of e+e—pairs (ROI) 2) analysis of e+e—pairs (event rejection)
Event flow Event sizes and number of links TPC only
Level-3 tasks • Online (sub)-event reconstruction: • optimization and monitoring of detector performance • monitoring of trigger selectivity • fast check of physics program • Data rate reduction • data volume reduction • regions-of-interest and partial readout • data compression • event rate reduction • (sub)-event reconstruction and event rejection • p+p program • pile-up removal • charged particle jet trigger
Online event reconstruction • Optimization and monitoring of detector performance • see STAR: online tracking • Monitoring of trigger selectivity • see STAR: event rejection by Level-3 vertex determination • Fast check of physics program • see STAR: peripheral physics program has to be up and running on day 1
Data rate reduction • Volume reduction • regions-of-interest and partial readout • data compression • entropy coder • vector quantization • TPC-data modeling • Rate reduction • (sub)-event reconstruction and event rejection before event building
Regions-of-interest and partial readout • Example: selection of TPC sector and -slice based on TRD track candidate
Data compression:Entropy coder Probability distribution of 8-bit TPC data Variable Length Coding short codes for long codes for frequent values infrequent values Results: NA49: compressed event size = 72% ALICE: = 65% (Arne Wiebalck, diploma thesis, Heidelberg)
Data compression:Vector quantization compare • Sequence of ADC-values on a pad = vector: code book • Vector quantization = transformation of vectors into codebook entries • Quantization error: Results: NA49: compressed event size = 29 % ALICE: = 48%-64% (Arne Wiebalck, diploma thesis, Heidelberg)
Data compression: TPC-data modeling simple local track model (e.g. helix) track parameters • Fast local pattern recognition: • Track and cluster modeling: comparison to raw data local track parameters analytical cluster model quantization of deviations from track and cluster model Result: NA49: compressed event size = 7 %
Fast pattern recognition Essential part of Level-3 system • crude complete event reconstruction monitoring • redundant local tracklet finder for cluster evaluation efficient data compression • selection of (,,pT)-slices ROI • high precision tracking for selected track candidates • jets, dielectrons, ...
Fast pattern recognition • Sequential approach • cluster finder, vertex finder and track follower • STAR code adapted to ALICE TPC • reconstruction efficiency • timing results • Iterative feature extraction • tracklet finder on raw data and cluster evaluation • Hough transform
Fast cluster finder (1) • timing: 5ms per padrow
Fast cluster finder (3) • Efficiency • Offline efficiency
Fast vertex finder • Resolution • Timing result: • 19 msec on ALPHA (667 MHz)
Fast track finder • Tracking efficiency
Fast track finder • Timing results
Hough transform (1) • Data flow
Hough transform (2) • -slices
Hough transform (3) • Transformation and maxima search
TPC on-line tracking • Assumptions: • Bergen fast tracker • DEC Alpha 667 MHz • Fast cluster finder excluding cluster deconvolution • Note: This cluster finder is sub optimal for the inner sectors and additional work is required here. However in order to get some estimate the computation requirements were based on the outer pad rows. It should be noted that the possibly necessary deconvolution in the inner padrows may require comparably more CPU cycles. • TPC L3 Tracking estimate: • Cluster finder on pad row of the outer sector 5 ms • tracking of all (monte carlo) space points for one TPC sector 600 msNote - this data may not include realistic noise • - tracking to first order is linear with the number of tracks provided there are few overlaps • - assuming one ideal processor below • Cluster finder on one sector (145 padrows) 725 ms • Process complete sector 1,325 s • Process complete TPC 47,7 s • Running at maximum TPC rate (200 Hz), January 2000 9540 CPUs • Assuming 20% overhead 11500 CPUs(parallel computation, network transfer, inner sector additional overhead, sector merging etc.) • Moores Law (60%/a) @ 2006 – 1a commission x10,5 1095 CPUs
Level-3 system architecture TPC sector #1 TPC sector #36 TRD ITS XYZ ROI local processing (subsector/sector) data compression global processing I (2x18 sectors) Level-3 trigger jets dielectron verification – event rejection global processing II (detector merging) global processing III (event reconstruction) monitoring
simple architecture trivial parallel processing throughput always limited to 10-20 Hz due to bandwidth limitation cannot fulfill all Level-3 requirements minimized data transfer scalable distributed computing farm (500-1000 nodes + network) would do the job Level-3 implementation scenariosA B Detectors Detectors (sub)detector # 1 2 n Level-3 DAQ-EVB event # 1 2 n Level-3 DAQ-EVB
Conclusion • Need for online (crude/partial/sub) event reconstruction and event rejection • Essential task: fast pattern recognition (TPC) • Distributed computing farm (500-1000 nodes) close to the detector readout would do the job
Preprocessing per sector RCU raw data, 10bit dynamic range, zero suppressed Huffman coding and vector quantization detector front-end electronics fast cluster finder: simple unfolding, flagging of overlapping clusters RORC cluster list fast vertex finder fast track finder initialization (e.g. Hough transform) receiver node Hough histograms global node vertex position raw data
TPC - RCU • TPC front-end electronics system architecture and readout controller unit. • Pipelined Huffman Encoding Unit, implemented in a Xilinx Virtex 50 chip* * T. Jahnke, S. Schoessel and K. Sulimma, EDA group, Department of Computer Science, University of Frankfurt
Processing per sector raw data, 10bit dynamic range; zero suppressed vertex position, cluster list slicing of padrow-pad-time space into sheets of pseudo-rapidity, subdiving each sheet into overlapping patches RORC sub-volumes in r,, fast track finder: 1. Hough transformation seeds fast track finder: 2. Hough maxima finder 3. tracklett verification track segments receiver node cluster deconvolution and fitting updated vertex position updated cluster list, track segment list
TPC PCI-RORC • Simple PCI-RORC • TPC PCI-RORC FPGA Coprocessor DIU PCI bridge Glue logic interface DIU card SRAM PCI bus
TPC PCI-RORC:FPGA co-processor FPGA FPGA PCI 66/64 • Fast cluster finder (outer padrows) • pad: internal 512x10 RAM • 2 external and 2 internal read accesses per hit • timing (in clock cycles, e.g. 5 nsec)1: #(cluster-pixels per pad) / 2 + #hits • centroid calculation: pipelined array multiplier • Fast vertex finder • histograms of cluster centroids • maxima finding and centroid calculation • Fast track finder: Hough transformations2 • (row,pad,time)-to-(r,,) transformation • (n-pixel)-to-(circle-parameter) transformation • 10-60 M transforms/sec (limited by memory access)1 msecs for a central Pb+Pb event (S)RAM PCI 1. Timing estimates by K. Sulimma, EDA group, Department of Computer Science, University of Frankfurt 2. E.g. see Pattern Recognition Algorithms on FPGAs and CPUs for the ATLAS LVL2 Trigger, C. Hinkelbein et at., IEEE Trans. Nucl. Sci. 47 (2000) 362.
Postprocessing (all sectors) ... ... sector 1 sector 19 sector 36 cluster list, track segment list cluster list, track segment list cluster list, track segment list global nodes track segment merging, precise distortion corrections, track refitting, vertex fitting efficient data compression by cluster and track modeling updated vertex position updated cluster list, updated track segment list compressed data detector information merging, Level-3 trigger decision other detectors accept/reject
Level-3 TPC task • Efficient data formatting, Huffman coding and Vector quanitization: • TPC Readout Controller Unit • Fast cluster finder, fast vertex finder and Hough transformation: • FPGA implementation on PCI Receiver Card • Pattern recognition: Hough maxima and track segment finder, cluster evaluation: • Level-3 farm, local level • Cluster and tracklett modelling – data compression: • Level-3 farm, local level • (Sub)-event reconstruction: event rejection or sub-event selection: • Level-3 farm, global level
Level-3 TPC pattern recognition scheme • Preprocessing • Fast cluster finder on a fibre patch scope • Fast vertex finder using all/outer cluster information • Fast tracker (seed finder) working on isolated clusters per sector • Processing • Defining (r,,) sub-volumes per sector • Dividing the sub-volumes into overlapping patches • Perform track finding on raw ADC data • Find and unfold clusters belonging to track segments • Combine track segments on sector level • Model clusters and compress track and cluster information • Postprocessing • Combine track segments from different sectors • Reconstruct event
Requirements on the RORC design concerning Level-3 tasks • Level-3 TPC data reduction scheme • PCI-RORC design
Data volume and event rate bandwidth TPC detector data volume = 300 Mbyte/event data rate = 200 Hz 60 Gbyte/sec front-end electronics 15 Gbyte/sec realtime data compression & pattern recognition PC farm = 1000 clustered SMP parallel processing < 2 Gbyte/sec DAQ – event building < 1.2 Gbyte/sec permanent storage system
Data flow • Efficient data formatting, Huffman coding and Vector quanitization: • TPC Readout Controller Unit • Fast cluster finder, fast vertex finder and tracker initalization (e.g. Houghtransform): • FPGA implementation on PCI Receiver Card • Pattern recognition: (Hough maxima and) track segment finder, cluster evaluation: • Level-3 farm, local level • Cluster and tracklett modelling – data compression: • Level-3 farm, local level • (Sub)-event reconstruction: event rejection or sub-event selection: • Level-3 farm, global level