Performance and Scalability of xrootd

Performance and Scalability of xrootd Andrew Hanushevsky (SLAC), Wilko Kroeger (SLAC), Bill Weeks (SLAC),Fabrizio Furano (INFN/Padova), Gerardo Ganis (CERN)Jean-Yves Nief (IN2P3), Peter Elmer (U Wisconsin) Les Cottrell (SLAC), Yee Ting Li (SLAC) • Computing in High Energy Physics • 13-17 February 2006 • http://xrootd.slac.stanford.edu • xrootd is largely funded by the US Department of Energy • Contract DE-AC02-76SF00515 with Stanford University

Outline • Architecture Overview • Performance & Scalability • Single Server Performance • Speed, latency, and bandwidth • Resource overhead • Scalability • Server and administrative • Conclusion 2: http://xrootd.slac.stanford.edu

Performance authentication (gsi, krb5, etc) lfn2pfn prefix encoding authorization (name based) Protocol (1 of n) (xrootd) File System (ofs, sfs, alice, etc) Storage System (oss, drm/srm, etc) Scaling Clustering (olbd) xrootd Plugin Architecture Protocol Driver (XRD) 3: http://xrootd.slac.stanford.edu

Performance Aspects • Speed for large transfers • MB/Sec • Random vs Sequential • Synchronous vs asynchronous • Memory mapped (copy vs “no-copy”) • Latency for small transfers • m sec round trip time • Bandwidth for scalability • “your favorite unit”/Sec vs increasing load 4: http://xrootd.slac.stanford.edu

Raw Speed I(sequential) Disk Limit sendfile() anyone? Sun V20z 2x1.86GHz Opteron 244 16GB RAM Seagate ST373307LC 73GB 10K rpm SCSI 5: http://xrootd.slac.stanford.edu

Raw Speed II (random I/O) (file not preloaded) 6: http://xrootd.slac.stanford.edu

Latency Per Request 7: http://xrootd.slac.stanford.edu

Event Rate Bandwidth NetApp FAS270: 1250 dual 650 MHz cpu, 1Gb NIC, 1GB cache, RAID 5 FC 140 GB 10k rpm Apple Xserve: UltraSparc 3 dual 900MHz cpu, 1Gb NIC, RAID 5 FC 180 GB 7.2k rpm Sun 280r, Solaris 8, Seagate ST118167FC Cost factor: 1.45 8: http://xrootd.slac.stanford.edu

Latency & Bandwidth • Latency & bandwidth are closely related • Inversely proportional if linear scaling present • The smaller the overhead the greater the bandwidth • Underlying infrastructure is critical • OS and devices 9: http://xrootd.slac.stanford.edu

Server Scaling (Capacity vs Load) 10: http://xrootd.slac.stanford.edu

I/OBandwidth (wide area network) SLAC to Seattle • SC2005 BW Challenge • Latency Û Bandwidth • 8 xrootd Servers • 4@SLAC & 4@Seattle • Sun V20z w/ 10Gb NIC • Dual 1.8/2.6GHz Opterons • Linux 2.6.12 • 1,024 Parallel Clients • 128 per server • 35Gb/sec peak • Higher speeds killed router • 2 full duplex 10Gb/s links • Provided 26.7% overall BW • BW averaged 106Gb/sec • 17 Monitored links total Seattle to SLAC BW Challenge ESnet routed ESnet SDN layer 2 via USN http://www-iepm.slac.stanford.edu/monitoring/bulk/sc2005/hiperf.html 11: http://xrootd.slac.stanford.edu

xrootd Server Scaling • Linear scaling relative to load • Allows deterministic sizing of server • Disk • NIC • CPU • Memory • Performance tied directly to hardware cost • Underlying hardware & software are critical 12: http://xrootd.slac.stanford.edu

Overhead Distribution 13: http://xrootd.slac.stanford.edu

OS Effects 14: http://xrootd.slac.stanford.edu

Device & File System Effects I/O limited CPU limited UFS good on small reads VXFS good on big reads 1 Event » 2K 15: http://xrootd.slac.stanford.edu

NIC Effects 16: http://xrootd.slac.stanford.edu

Super Scaling • xrootd Servers Can Be Clustered • Support for over 256,000 servers per cluster • Open overhead of 100us*log64(number servers) • Uniform deployment • Same software and configuration file everywhere • No inherent 3rd party software requirements • Linear administrative scaling • Effective load distribution 17: http://xrootd.slac.stanford.edu

Cluster Data Scattering (usage) 18: http://xrootd.slac.stanford.edu

Cluster Data Scattering (utilization) 19: http://xrootd.slac.stanford.edu

Low Latency Opportunities • New programming paradigm • Ultra-fast access to small random blocks • Accommodate object data • Memory I/O instead of CPU to optimize access • Allows superior ad hoc object selection • Structured clustering to scale access to memory • Multi-Terabyte memory systems at commodity prices • PetaCache Project • SCALLAStructured Cluster Architecture for Low Latency Access • Increased data exploration opportunities 20: http://xrootd.slac.stanford.edu

Memory Access Characteristics Block size effect on average overall latency per I/O (1 job - 100k I/O’s) Disk I/O Scaling effect on average overall latency clients (5 - 40 jobs) Mem I/O 21: http://xrootd.slac.stanford.edu

Conclusion • System performs far better than we anticipated • Why? • Excruciating attention to details • Protocols, algorithms, and implementation • Effective software collaboration • INFN/Padova: Fabrizio Furano, Alvise Dorigao • Root: Fons Rademakers, Gerri Ganis • Alice: Derek Feichtinger, Guenter Kickinger • Cornell: Gregory Sharp • SLAC: Jacek Becla, Tofigh Azemoon, Wilko Kroeger, Bill Weeks • BaBar: Pete Elmer • Critical operational collaboration • BNL, CNAF, FZK, INFN, IN2P3, RAL, SLAC • Commitment to “the science needs drive the technology” 22: http://xrootd.slac.stanford.edu

Performance and Scalability of xrootd