130 likes | 301 Views
ALICE Online upgrade. Pierre VANDE VYVRE. October 03, 2012 Offline Meeting, CERN. Requirements: Event Size and Rate.
E N D
ALICE Online upgrade Pierre VANDE VYVRE October 03, 2012 Offline Meeting, CERN
Requirements: Event Size and Rate • Expected peak Pb-Pb Minimum Bias rate after LS2: 50 kHzSystem must be able to scale with a safety factor of 2 (readout part from the start)Expected average over a fill Pb-Pb Minimum Bias (MB) rate after LS2: 20 kHz • Global data compression strategy: • Optimize the overall cost by doing zero suppression on the detectors • Data throughput reduced in the online farm by data compression (no selection) • Combined efficiency of ALICE and LHC: ~106 s/month of data taking1 month of HI: ~2.0 x 1010 events (1) Under study P. Vande Vyvre
Online System Requirements P. Vande Vyvre • Common DAQ and HLT farm for cost reason. Should also be usable as offline Tier 1. • Detector readout • Detector throughput at 50 kHz: 9 Tbit/s + safety factor 2 for 100 kHz • Capacity : 25 Tbit/s (~2500 detector links 10 Gb/s) • First-Level Processors (FLPs) • Input : 12 inputs at 10 Gb/s • ~250 FLPs needed • Event building • Input to global reconstruction: 50 kHz * ~ 4.5 MB/event: ~ 225 GB/s • Output to data storage: ~ 82.5 GB/s • Total network throughput : ~310 GB/s or 2.5 Tb/s • 250 x 2 links at 10 Gb/s (or 1 link at 40 Gb/s) to event building and HLT • Event-Building and Processing nodes (EPNs) • Current HLT: ~200 nodes with GPUs, ~2500 cores • Computing power requirements increase by ~100 in 6 years • Technology evolution (“Moore’s law” extension to computers): factor ~16 • → ~1250 EPNs with GPUs needed • ~1250 links at 10 Gb/s or 40 Gb/s to the network
Upgrade Online ~ 2500 DDL3s10 Gb/s DAQ and HLT 10 or 40Gb/s 2x10 or 40 Gb/s Farm Network Storage Network FLP ITS RORC3 EPN FLP TPC RORC3 FLP TRD RORC3 DataStorage EPN FLP EMCal RORC3 L0 FLP PHOS DataStorage RORC3 EPN FLP TOF RORC3 FLP Muon RORC3 L0L1 EPN FLP FTP RORC3 ~ 250 FLPs ~ 1250 EPNs Trigger Detectors P. Vande Vyvre P. Vande Vyvre
Dataflow • Combination of continuous and triggered readout • Fast Trigger Processor to complement/replace the present CTP • L0: Minimum Bias trigger for every interaction • L1: selective trigger • LHC clock and L0 trigger distributed for data tagging and test purpose • Detector triggering and electronics • Continuous readout for TPC and ITS when average inter-events arrival time < TPC drift time • At 50 kHz, ~5 events in TPC during the TPC drift time of 92 µs • TRD: MB L0 trigger. Max: 50 kHz. • TOF: MB L0 trigger. Max: 400 kHz. • EMC, PHOS, Muon: L1 rare trigger • Detector readout • RORC3: DDL 3 interface and cluster finder in the same FPGA shared by DAQ and HLT • Event building and processing • FLPs: clusters of sub-events for triggered detectors and time-windows for continuous readout detectors • EPNs: • Tracking of time windows • Association of clusters to events only possible after the tracking • Final step of event building after the online reconstruction P. Vande Vyvre P. Vande Vyvre
DDL and RORC evolution 4 4 4 P. Vande Vyvre
FLP I/O Throughput CPU 1 CPU 1 PCIe Gen3 40 lanes PCIe Gen3 40 lanes QPI DDR3 DDR3 DDR3 DDR3 DDR3 DDR3 DDR3 DDR3 P. Vande Vyvre • First-Level Processors (FLP): powerful I/O machine • Most recent architecture of Intel dual sockets servers (Sandy Bridge): • Dual socket • 40 PCIe Gen lanes directly connected to the processor (40 GB/s) • Memory bandwidth: 6-17 GB/s per memory bank
C-RORC Prototyping Plot of H. Engel P. Vande Vyvre C-RORC aka RORC-2 prototype Performance test of PCIe Gen 2 interface on 2 generations of Intel CPU New CPU (Sandy Bridge) provide a significantly better performance than the previous one (Nehalem) in particular for small event size Fulfils the needs: 1 PCIe Gen 2 x8 provides 2.3-3.2 GB/s
Network Topology • 2 different topologies are considered for the computing farm network • Fat-tree network with one central director switch or router and several edge switches • + one single box to configure • - one central point of failure • Spine and leaf network with several spine switches interconnecting leaf switches • + cheaper cost per port, lower power consumption, more ports per Rack Unit (RU) • + graceful degradationin case of failure • - more cabling Director Switch Uplink … 2 1 m 3 Edge Switch 1 n 1 n 1 n 1 n 1 n 1 n 1 n 1 n 2 3 p 1Spine Switch … Uplinks 2 1 m 3 … Leaf Switch 9 P. Vande Vyvre P. Vande Vyvre
Network Layout Infiniband • Network requirements • Ports: ~250 FLPs and ~1250 EPNTotal ~1500 ports at 40 Gb/s • Total throughput: • EPN input: 50 kHz * ~ 4.5 MB/event: ~ 225 GB/s • EPN output: ~ 82.5 GB/s • Total: 310 GB/s or 2.5 Tb/s • Infiniband network using a fat tree topology • Edge switches: SX6025 (36 ports, 4.03 Tb/s)32 ports: data traffic to/from the nodes2 ports: data traffic to/from the director switch • Director switch IS5200 (216 ports, 17.3 Tb/s) • Total throughput: 2 x 48 x 40 Gb/s = 3.8 Tb/s • Total ports: 48 x 32 = 1536 ports at 40 Gb/s 2 x 40 Gb/s 2 x 40 Gb/s … 1 2 48 3 4 1 32 1 32 1 32 1 32 1 32 P. Vande Vyvre 2 network technologies are being considered: Infiniband and Ethernet Both can be used with both topologies
Network Layout Ethernet • Network requirements • Ports: ~2*250 FLPs and ~1250 EPNTotal ~1750 ports at 10 Gb/s • Total throughput: • EPN input: 50 kHz * ~ 4.5 MB/event: ~ 225 GB/s • EPN output: ~ 82.5 GB/s • Total: 310 GB/s or 2.5 Tb/s • Ethernet network using a fat tree topology • Leaf switch: Z9000 (128 ports 10 GbE, )75 ports 10 GbE: data traffic to/from the nodes4 ports 40 GbE: data traffic to/from the other switches • Spine switch: Z9000 (32 ports 40GbE) • Total throughput: 4x 24 x 40 =3.8 Tb/s • Total ports: 24x 75 = 1800 ports at 10 Gb/s 1 3 4 2 4 x 40 Gb/s 4 x 40 Gb/s 3 4 1 2 24 … • 1 75 1 75 1 75 • 1 75 • 1 75 P. Vande Vyvre
Software and Firmware P. Vande Vyvre • The ALICE online and offline sw frameworks are to be redesigned and rewritten: • New requirements (common farm, higher rates and throughput, large fraction of “offline” processing performed online etc) • New computing platforms requiring much more parallelism • DAQ-HLT-Offline Framework Panel • Common Firmware framework to be defined • Detector readout • Cluster finder • Accommodate for HLT modes
Next steps (Project) P. Vande Vyvre • 2012-2014 Strategy definition and R&D. • Kick-off meeting during October ALICE week: Wednesday 10th October 14:00-18:00 • Definition of the strategy to achieve a common DAQ, HLT and offline software framework. • R&D on key hardware, firmware and software technologies. • 2013-2016 Simulation, demonstrators and prototypes. Choice of technologies. • Development and exploitation of a program simulating the trigger and dataflow architecture. • Development of demonstrators and prototypes for the key hardware, firmware technologies. • Development of the new common software framework. • Selection of technologies used in production. • Technical Design Report • 2017-2020 Production and procurement. Staged deployment. • Production of the hardware developed by the projects. • Market surveys, tendering and procurement of commercial equipement. • Staged deployment with a profile compatible with the detector installation for the readout part (DDL3 and FLPs) and the accelerator luminosity for the processing (EPNs, network and data storage).