1 / 41

Some Flavours of Computing at DESY

Some Flavours of Computing at DESY. Rainer Mankel DESY Hamburg. DESY in General. National center of basic research in physics Member of HGF Sites: Hamburg + Zeuthen (near Berlin) About 1600 employees, including 400 scientists 1200 users in particle physics from 25 countries

brandy
Download Presentation

Some Flavours of Computing at DESY

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Some Flavours of Computing at DESY Rainer Mankel DESY Hamburg

  2. DESY in General • National center of basic research in physics • Member of HGF • Sites: Hamburg + Zeuthen (near Berlin) • About 1600 employees, including 400 scientists • 1200 users in particle physics from 25 countries • 2200 users in HASYLAB from 33 countries ... and almost everybody needs computing

  3. DESY in a Nutshell • Four HERA experiments: H1 (ep), ZEUS (ep), HERMES (e N), HERA-B (pN): reconstruction, analysis, ... • Accelerators: machine controls • HASYLAB: synchrotron radiation • TTF

  4. DESY: Future Projects • PETRA as a New High Brilliance Synchrotron Radiation Source: DESY plans to convert the PETRA storage ring into a new high brilliance third generation synchrotron radiation source. 1.4 MEUR from Federal Ministry of Education and Research for design phase • design report end 2003 • construction start in 2007? • TESLA: • e+e- Superconducting Linear Collider (0.5 ... 1 TeV) • integrated X-ray laser • this Monday 11:00: very positive recommendation from German Science Council (Wissenschaftsrat)

  5. DESY Computing • Impossible to cover this large topic in a short talk • Restrict to some particular areas of interest

  6. Mainframe, SMP Commodity hardware DM, Lit, Pta, ... Technologies: General Transitions IRIX Solaris

  7. DESY Central Computing (IT Division) • O(70) people • Operating ~all imaginable services (mail, web, registry, databases, AFS, HSM, backup (Tivoli), Windows, networks, firewalls, dCache...) • Tape storage: 4 STK Powderhorn tape silos (interconnected) • media: 9840 cartridges (old, 20GB), 9940B (new, 200 GB)

  8. Supported Operating Systems • IRIX 6.5, HP-UX on their way out • Alpha-OSF, AIX were never really supported but still used by some groups • Long-term support for • Linux • DESY Linux 3 (based on SuSE-Linux 6.3) until end of the year • DESY Linux 4 (based on SuSE-Linux 7.2) • Solaris • Linux installation/support service • YAST for initial installation • SALAD / BOOM (homemade tools) for dynamic software updates

  9. HERA Experiments and Computing • HERA delivered record ep luminosity of 50 pb-1in 2000 • luminosity upgrade during 2001 • intended improvement factor of 5 • 1 fb-1 planned until 2006 • major detector upgrades in experiments • HERA experiments have their own expertise in computing • closer look at ZEUS computing

  10. Computing of a HERA Experiment: ZEUS • General purpose ep collider experiment • About 450 physicists • Expect 20-40 TB/year of RAW data after luminosity upgrade • whole of DESY approaches PB regime during HERA-II lifetime • O(100) modern processors in farms for reconstruction & batch analysis • MC production distributed world-wide („funnel“), O(3-5 M events/week) routinely • funnel is an early computing grid New vertex detector within calorimeter (new ZEUS event display)

  11. Tape storage incr. 20-40 TB/year MC production Data processing/ reprocessing Data mining Disk storage 3-5 TB/year ~450 Users General Challenge (ZEUS) 50 M  200 M Events/year Interactive Data Analysis

  12. HERA-II Challenges (cont’d) • Data processing should be closely linked to data-taking • sufficient capacity for reprocessing • Analysis facility • increased standards of • reliability • transparency • turnaround • high level of approval from users essential • Interactive environment

  13. Hardware • ZEUS phased out SGI Challenge SMPs in Feb 2002 • After first PC farm in 1997, computing has completely moved to PCs • Computing nodes Intel Pentium 350 MHz – 1.2 GHz, mostly dual processors • a new farm with dual Xeon 2.2 GHz just ordered • Fast Ethernet • central farm server with Gb Ethernet • Workgroup servers • SUN Sparcs phased out in April • new system: DELFI1 cluster (PC Intel/Linux) • File servers • SGI Origin with 8.5 TB of SCSI/FC (partially RAID5) • by now 7 commodity PC-based DELFI3 servers (12 TB)

  14. HSM HSM HSM HSM HSM 1Gb/s SWITCH FILE SERVERS FARM SERVER 2 x 48 100Mb/s 100Mb/s 1Gb/s PC FARM Network Structure

  15. ZEUS Hardware

  16. Performance of Reconstruction Farm old farm new farm new farm + tuning 2 M Events/day

  17. 19” DELFI1* 2x 40 GB system (mirrored) 2x 80 GB workgroup space 3Ware 7850 controller 2x 40 GB system (mirrored) 6x 80 GB workgroup space stripe or RAID5 3Ware 7850 controller • for high-availability applications (workgroup servers) *DESY Linux File Server

  18. Commodity File Servers DELFI3 • custom built (Invention: F. Collin / CERN) • 2x 40 GB system(EIDE) • 20x 120 GB data • 3 RAID controllers • Gb ethernet • 2.4 TB of storage for 13000 EUR

  19. Commodity File Servers (cont’d) DELFI2 • 12 EIDE disks • 2 RAID controllers • 19” rack mount • more economic in terms of floor space • only few units yet

  20. Batch System • ZEUS uses LSF 4.1 as underlying batch system, with a custom front end & user interface [H1: PBS] • originally introduced to integrate different batch systems (NQS, LSF) • Each job is executed in its own spool directory • no conflicts between several parallel jobs of the same user • User can specify resources required (e.g. “SuSE 7.2 operating system only”) • Our LSF 4.1 scheduler uses the fair share policy • ensures that also occasional users get their fair share of the system • no hard queue limits needed (as #jobs per user and queue) • “power users” can take ~unlimited resources when system has capacity to spare (Priority) History

  21. ZEUS Monitoring • Efficient monitoring is a key for reliable operation of a complex system • Three independent monitoring systems introduced in ZEUS Computing during the shutdown: • LSF-embedded monitoring • statistics on time each jobs spends in queued/running/system-suspended/user-suspended state • quantitative information for queue optimization etc • SNMP • I/O traffic and CPU efficiency • web interface • history • NetSaint, now called Nagios • availability of various services on various hosts • notification • automated trouble-shooting

  22. Example for SNMP-based Monitoring 90% CPU efficiency 1-3 MB/s input rate

  23. NetSaint Monitoring system • Hosts, network devices, services (e.g. web server), disk space,… • thresholds configurable • Web interface • Notification (normally Email, if necessary SMS to cellular phone) • History

  24. Reliability Issues • Tight monitoring of system is one key to reliability, but... • Typical analysis user needs to access huge amounts of data • In large systems, there will always be a certain fraction of • servers which are down or unreachable • disks which are broken • files which are corrupt • It is hopeless to operate a large system on the assumption that everything is always working • this is even more true for commodity hardware • Ideally, the user should not even notice that a certain disk has died, etc • jobs should continue

  25. The ZEUS Event Store Gimme NC events, Q2>1000, at least one D* candidate with pT>5 Query Tag database (Objectivity/DB 7.0) Generic filename & event address MDST3.D000731.T011837.cz Filename de-referencing /acs/zeus/mini/00/D000731/MDST3.D000731.T011837.cz to I/O subsystem

  26. Mass storage system Addressing Events on Datasets no Disk copy existing? Select disk cache server & stage yes no Disk copy still valid? Server up? yes establish RFIO connection to file analyze event

  27. “The mass storage system is fundamental to the success of theexperiment” – Ian Bird at CHEP01 in Beijing

  28. Cache within a cache within a cache Access time Classical Picture CPU 10-9 s Primary Cache 2nd Level Cache Memory Cache Disk Controller Cache 10-3 s Disk Tape Library 102 s

  29. All-Cache Picture CPU • Disk files are only cached images of files in the tape library • Files are accessed via a unique path, regardless of server name etc • optimized I/O protocols Main board Cache 2nd Level Cache Memory Cache Disk Controller Cache Fabric Disk Cache Tape Library

  30. dCache (cont’d) • Mass storage I/O subsystem should provide • transparent access to disk & tape data • smart caching of tape datasets • efficient I/O transfer protocol • Idea of dCache: distributed, centrally maintained system, joint DESY/FNAL development tpfs SSF dCache ZEUS (none) (none) dCache H1 (none) (none) dCache HERA-B FNAL [GRID] dCache 1997 1998 1999 2000 2001 2002

  31. dCache Features • Optimised usage of tape robot by coordinated read and writerequests (read ahead, deferred writes) • Better usage of network bandwidth by exploring thebest location for data • Ensure efficient usage of available resources • Robot, drives, tapes, server resources, cpu time • Minimize the service downtime due tohardware failure • Monitored by DESY-IT operator • No NFS access to disk pools required - access proceeds via the dcap API (dc_open, dc_read, dc_write, …) • Particularly intriguing features: • retry-feature during read access – job does not crash even if file or server become unavailable (as already in ZEUS-SSF) • “Write pool” could be used by online chain (reduces #tape writes) • reconstruction could read RAW data directly from disk pool (no staging)

  32. dCache: Distributed Pools

  33. dCache (cont’d)

  34. dCache Perspectives • dCache has been jointly developed by DESY & FNAL • DESY uses OSM as underlying HSM system, FNAL ENSTORE • Experiments using dCache: • ZEUS • H1 • HERA-B • CDF • MINOS • SDSS • CMS • GRID relevance • dCache is an integral part of a JAVA-based GridFTP server, completed & announced last week • successful inter-operation with globus-url-copy client http://www-dcache.desy.de

  35. Future: Will We Continue To Use Tapes? • Tape • 100 $ per cartridge (200 GB), 5000 cartridges per silo • 100 k$ per silo • 30 k$ per drive (typical number: 10) • 0.9 $ / GB • Disk • 8 k$ per DELFI2 Server (1 TB) • 8 $ / GB (from V. Gülzow) Yes !

  36. General Lab Culture • A high degree of consensus between providers & users is essential • At DESY, communication proceeds through • Computer Users Committee daily business • Computing Review Board long-range planning, projects • Computer Security Council security issues • Network Committee networking issues • Topical meetings • Linux Users Meeting • Windows Users Meeting • ... • direct communication between experiments‘ offline coordinators & IT • CUC & CRB are chaired by members of physics community

  37. DESY Computing Committee Structure

  38. Summary • Only a glimpse of some facets of computing at DESY • Commodity equipment gives unprecedented power, but requires a dedicated fabric to work reliably

More Related