360 likes | 381 Views
ALICE Data Challenges On the way to recording @ 1 GB/s. What is ALICE. ALICE Data Acquisition architecture. Trigger Level 0. Readout Receiver Card. Local Data Concentrator. Event Building switch 3 GB/s. Event Destination Manager. Storage switch 1.25 GB/s. Inner Tracking
E N D
What is ALICE
ALICE Data Acquisition architecture Trigger Level 0 Readout Receiver Card Local Data Concentrator Event Building switch 3 GB/s Event Destination Manager Storage switch 1.25 GB/s Inner Tracking System Time Projection Chamber Particle Identification Photon Spectrometer Trigger Detectors Muon Trigger decisions Front-End Electronics Trigger data Detector Data Link Trigger Level 1 Trigger Level 2 Global Data Collector Perm. Data Storage
ALICE running parameters • Two different running modes: • Heavy Ion (HI): 106 seconds/year • Proton: 107 seconds/year • One Data Acquisition system (DAQ):Data Acquisition and Test Environment (DATE) • Many triggers classes each providing events at different rates, sizes and sources • HI data rates: 3 GB/s 1.25 GB/s ~1 PB/year to mass storage • Proton run: ~ 0.5 PB/year to mass storage • Staged DAQ installation plan (20% 30% 100%): • 85 300 LDCs, 10 40 Global Data Collectors (GDC) • Different recording options: • Local/remote disks • Permanent Data Storage (PDS):CERN Advanced Storage Manager (CASTOR)
History of ALICE Data Challenges • Started in 1998 to put together a high-bandwidth DAQ/recording chain • Continued as a periodic activity to: • Validate interoperability of all existing components • Assess and validate developments, trends and options • commercial products • in-house developments • Provide guidelines for ALICE & IT development and installation • Continuously expand up to ALICE requirement at LHC startup
Performance goals MBytes/s
Data volume goals TBytes to Mass Storage
The ALICE Data Challenge IV
Components & modes Private network CERN backbone LDC emulator ALICE DAQ Objectifier CASTOR FE CASTOR PDS RAW DATA OBJECTS RAW EVENTS AFFAIR CASTOR monitor
Targets • DAQ system scalability tests • Single peer-to-peer tests: • Evaluate the behavior of the DAQ system components with the available HW • Preliminary tuning • Multiple LDC/GDC tests: • Add the full Data Acquisition (DAQ) functionality • Verify the objectification process • Validate & benchmark the CASTOR I/F • Evaluate the performance of new hardware components: • New generation of tapes • 10 Gb Ethernet • Achieve a stable production period: • Minimum 200 MB/s sustained • 7 days non stop • 200 TB data to PDS
Software components • Configurable LDC Emulator (COLE) • Data Acquisition and Test Environment (DATE) 4.2 • AFine Fabric and Application Information Recorder (AFFAIR) V1 • ALICE Mock Data Challenge objectifier (ALIMDC) • ROOT (Object-Oriented Data Analysis Framework) v3.03 • Permanent Data Storage (PDS): CASTOR V1.4.1.7 • Linux RedHat 7.2, kernel 2.2 and 2.4 • Physical pinned memory driver (PHYSMEM) • Standard TCP/IP library
Hardware setup • ALICE DAQ: infrastructure & benchmarking • NFS & DAQ servers • SMP HP Netserver (4 CPUs): setup & benchmarking • LCG testbed (lxshare): setup & production • 78 CPU servers on GE • Dual ~1GHz Pentium III, 512 MB RAM • Linux kernel 2.2 and 2.4 • NFS (installation, distribution) and AFS (unused) • [ 8 .. 30 ] DISK servers (IDE-based) on GE • Mixed FE/GE/trunk GE, private & CERN backbone • 2 * Extreme Networks Summit 7i switches (32 GE ports) • 12 * 3COM 4900 switches (16 GE ports) • CERN backbone: Enterasys SSR8600 routers (28 GE ports) • PDS: 16 * 9940B tape drives in two different buildings • STK linear tapes, 30 MB/s, 200 GB/cartridge
Networking LDCs & GDCs DISK servers 3 3 3 3 2 2 2 6 CPU servers on FE 2 3 3 3 3 LDCs & GDCs Backbone (4 Gbps) 16 TAPE servers (distributed)
Scalability test • Put together as many hosts as possible to verify the scalability of: • run control • state machines • control and data channels • DAQ services • system services • hardware infrastructure • Connect/control/disconnect plus simple data transfers • Data patterns, payloads and throughputs uninteresting • Keywords: usable, reliable, scalable, responsive
Single peer-to-peer • Compare: • Architectures • Network configurations • System and DAQ parameters • Exercise: • DAQ system network modules • DAQ system clients and daemons • Linux system calls, system libraries and network drivers • Benchmark and tune: • Linux parameters • DAQ processes, libraries and network components • DAQ data flow
Single peer-to-peer 110 100 5.00 90 80 4.00 70 MB/s % CPU/MB 60 3.00 50 40 2.00 30 20 1.00 10 0 0.00 0 200 400 600 800 1000 1200 1400 1800 2000 1600 Event size (KB) Transfer speed GDC CPU usage LDC CPU usage
Full test runtime options • Different trigger classes for different traffic patterns • Several recording options • NULL • GDC disk • CASTOR disks • CASTOR tapes • Raw data vs. ROOT objects • We concentrated on two major traffic patterns: • Flat traffic: all LDCs send the same event • ALICE-like traffic: periodic sequence of different events distributed according to forecasted ALICE raw data
Performance Goals MBytes/s 650 MB/s
Flat data traffic • 40 LDCs * 38 GDCs • 1 MB/event/LDC NULL • Occupancies: • LDCs: 75% • GDCs: 50% • Critical item: load balancing over the GE trunks (2/3 nominal)
Load distribution on trunks Distributed Same switch 500 400 300 MB/s 200 100 1 2 3 4 5 6 7 # LDCs
ALICE-like traffic • LDCs: • rather realistic simulation • partitioned in detectors • no hardware trigger • simulated readout, no “real” input channels • GDCs acting as: • event builder • CASTOR front-end • Data traffic: • Realistic event sizes and trigger classes • Partial detector readout • Networking & nodes’ distribution scaled down & adapted
Challenge setup & outcomes • ~ 25 LDCs • TPC: 10 LDCs • others detectors: [ 1 .. 3 ] LDCs • ~ 50 GDCs • Each satellite switch: 12 LDCs/GDCs (distributed) • [ 8 .. 16 ] tape servers on the CERN backbone • [ 8 .. 16 ] tape drives attached to a tape server • No objectification • named pipes too slow and too heavy • upgraded to avoid named pipes: • ALIMDC/CASTOR not performing well
Impact of traffic pattern FLAT/CASTOR ALICE/NULL ALICE/CASTOR
Performance Goals MBytes/s 200 MB/s
Production run • 8 LDCs*16 GDCs, 1 MB/event/LDC (FLAT traffic) • [ 8 .. 16 ] tape server and tape units • 7 days at ~300 MB/s sustained, > 350 MB/s peak, ~ 180 TB to tape • 9 Dec: too much input data • 10 Dec: HW failures on tape drives & reconfiguration • Despite the failures, always exceeded the performance goals
System reliability • Hosts: • ~ 10% Dead On Installation • ~ 25% Failed On Installation • Long period of short runs (tuning): • occasional problems (recovered) with: • name server • network & O.S. • in average [1 .. 2 ] O.S. failures per week (on 77 hosts) • unrecoverable occasional failures on GE interfaces • Production run: • one tape unit failed and had to be excluded
Outcomes • DATE • 80 hosts/160+ roles with one run control • Excellent reliability and performance • Scalable and efficient architecture • Linux • Few hiccups here and there but rather stable and fast • Excellent network performance/CPU usage • Some components are too slow (e.g. named pipes) • More reliability needed from the GE interfaces
Outcomes • FARM installation and operation: not to be underestimated! • CASTOR • Reliable and effective • Improvements needed on: • Overloading • Parallelizing tape resources • Tapes • One DOA and one DOO • Network: silent but very effective partner • Layout made for a farm, not optimized for ALICE DAQ • 10 GB Ethernet tests: • failure at first • problem “fixed” too late for the Data Challenge • reconfiguration: transparent to DAQ and CASTOR
Future ALICE Data Challenges MBytes/s • Continue the planned progression • ALICE-like pattern • Record ROOT objects • New technologies • CPUs • Servers • Network • NICs • Infrastructure • Beyond 1 GbE • Insert online algorithms • Provide some “real” input channels • Get ready to record at 1.25 GB/s