120 likes | 140 Views
A TCP/IP transport layer for the DAQ of the CMS Experiment. Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research. ACAT03 - December 2003. Collision rate. 40 MHz. Level-1 Maximum trigger rate. 100 kHz. Average event size. . 1 Mbyte.
E N D
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERNEuropean Organization for Nuclear Research ACAT03 - December 2003
Collision rate 40 MHz Level-1 Maximum trigger rate 100 kHz Average event size 1 Mbyte No. of In-Out units 1000 Readout network bandwidth 1 Terabit/s Event filter computing power 5 10 MIPS 6 Data production Tbyte/day CMS & Data Acquisition CMS Data Data
NxM EVB Building the events Event builder : Physical system interconnecting data sources with data destinations. It has to move each event data fragments into a same destination Event fragments : Event data fragments are stored in separated physical memory systems 2 1 3 1 2 3 512 512 Full events : Full event data are stored into one physical memory system associated to a processing unit 512 Data sources for 1 MByte events ~1000s HTL processing nodes
Processing Util/DDM Sensor readout XDAQ PCI HTTP TCP Ethernet Myrinet OS and Device Drivers XDAQ Framework • Distributed DAQ framework developed within CMS. • Construct homogeneous applications for heterogeneous processing clusters. • Multi-threaded (important to take advantage of SMP efficiently). • Zero copy message passing for the event data. • Peer to peer communication between the applications. • I2O for data transport, and SOAP for configuration and control. • Hardware and transport independency. Subject of presentation
TCP/IP Peer Transport Requirements • Reuse old, “cheap” Ethernet for DAQ • Transport layer requirements • Reliable communication • Hide the complexity of TCP • Efficient implementation • Simplex communication via sockets • Configurable • Support of blocking and non-blocking I/O
Framesend Select XDAQ Application 1 2 3 4 5 n #2 1 2 3 4 5 n #n 1 2 3 4 5 n Pending Queues Implementation of the non-blocking mode • Pending Queues • Thread safe PQ management • One PQ for each destination • Independent sending through sockets • Only one “Select” function call both to receive the packet and send the blocked data.
Applications (XDAQ) XDAQ Executive XDAQ Framework Receiver Object(s) Sender Object(s) ptATCP Peer Transport Layer Input SAP(s) Output SAP(s) ptATCPPort(s) OS Driver(s) = Creation of object = Sending = Receiving = other communication NIC (FE) NIC (GE) NIC (10GE) Communication via the transport layer
App 1 App 2 Throughput optimisation • Operating System tuning (kernel options+buffers) • Jumbo Frames • Transport protocol options • Communication techniques • Blocking vs. Non-Blocking I/O • Single/Multi-rail • Single/Multi-thread • TCP options (e.g.:Nagle algorithm) • …. Single rail Multi-rail App 1 App 2
Test network • Cluster size: 8x8 • CPU: 2x Intel Xeon (2.4 GHz), 512KB Cache • I/O system: PCI-X: 4 buses (max 6) . • Memory: Two-way interleaved DDR: 3.2 GB/s (512 MB) • NICs: 1 Intel 82540EM GE • 1 Broadcom NeXtreme BCM 5703x GE • 1 Intel Pro 2546EB GE (2port) • OS: Linux RedHat 2.4.18-27.7 (SMP) • Switches: 1 BATM- T6 Multi Layer Gigabit Switch (medium range) • 2 Dell Power Connect 5224 (medium range)
Working point Event Building on the cluster Conditions: • XDAQ+Event Builder • No Readout Unit inputs • No Builder Unit outputs • No Event Manager • PC: dual P4 Xeon • Linux 2.4.19 • NIC: e-1000 • Switch: Powerconnect 5224 • Standard MTU (1500 Bytes) • Each BU builds 128 events • Fixed fragment sizes Result: For fragment size > 4 kB: • Thru /node ~100 MB/s i.e. 80% utilisation
Two Rail Event Builder measurements • Test case: • Bare Event Builder (2x2) • No RU inputs • No BU outputs • No Event Manager • Options: • Non blocking TCP • Jumbo frames (mtu 8000) • Two rail • One thread • RU working point (16 kB) • Throughput/node = 240 MB/ s • i.e. 95% bandwidth
Conclusions • Achieved 100 MB/s per node in 8x8 configuration (1rail). • Improvements seen with the use of two rail, non-blocking I/O, with Jumbo frames. In 2x2 configuration over 230 MB/s obtained. • High CPU load. • We are also studying other networking and traffic shaping options.