180 likes | 190 Views
Report on CHEP 2007. Raja Nandakumar. Synopsis. Two classes of talks and posters Computer hardware Dominated by cooling / power consumption Mostly in the plenary sessions Software Grid job workload management systems Job submission by the experiments Site Job handling, monitoring
E N D
Report on CHEP 2007 Raja Nandakumar
Synopsis • Two classes of talks and posters • Computer hardware • Dominated by cooling / power consumption • Mostly in the plenary sessions • Software • Grid job workload management systems • Job submission by the experiments • Site Job handling, monitoring • Grid operations (Monte Carlo production, glexec, interoperability, …) • Data integrity checking • …. • Storage systems • Primarily concerning dCache and DPM • Distributed storage systems • Parallel session : Grid middleware and tools
Computing hardware • Power requirements of LHC computing • Important for running costs • ~330W to provision for 100W of electronics • Some sites running with air or water cooled racks
High performance and multi-core computing • Core Frequencies ~ 2-4 GHz, will not change significantly • Power • 1,000,000 cores at 25 W / core = 25 MW • Just for the cpu • Have to bring core power down by multiple orders of magnitude • reduces chip frequency, complexity, capability • Memory Bandwidth • As we add cores to a chip, it is increasingly difficulty to provide sufficient memory bandwidth • Application tuning to manage memory bandwidth becomes critical • Network and I/O Bandwidth, data integrity, reliability • A Petascale computer will have Petabytes of Memory • Current Single File Servers achieve 2-4 GB/s • 70+ hours to checkpoint 1 Petabyte • IO management is a major challenge • Memory Cost • Can’t expect to maintain current memory / core numbers at petascale. • 2GB/core for ATLAS / CMS
Grid job submission • Most new developments were on pilot agent based grid systems • Implement job scheduling based on “pull” scheduling paradigm • The only method for grid job submission LHCb • DIRAC (> 3 years experience) • Ganga is the user analysis front end • Also used in Alice (and Panda and Magic) • AliEn since 2001 • Used for production, user analysis, data management in LHCb & Alice • New developments for others • Panda : Atlas, Charmm • Central server based on Apache • GlideIn : Atlas, CMS, CDF • Based on Condor • Used for production and analysis • Very successful implementations • Real-time view of the local environment • Pilot agents can have some intelligence built into the system • Useful for heterogeneous computing environment • Recently Panda to be used for all Atlas production • One talk on distributed batch systems
Pilot agents • Pilot agents submitted on demand • Reserve the resource for immediate use • Allows checking of the environment before job scheduling • Only bidirectional network traffic • Unidirectional connectivity • Terminates gracefully if no work is available • Also called GlideIn-s • LCG jobs are essentially pilot jobs for the experiment
Glexec • A thin layer to change Unix domain credentials based on grid identity and attribute information • Different modes of operation • With or without setuid • Ability to change the user id of the final job • Enable VO to • Internally manage job scheduling and prioritisation • Late binding of user jobs to pilots • In production at Fermilab • Code ready and tested, awaiting full audit
Job Scheduler Web Portal LSF Scheduler MultiCluster LSF Scheduler Cluster/Desktops Cluster/Desktops LSF PBS SGE CCE LSF universus
LSF universus • Commercial extension of LSF • Interface to multiple clusters • Centralised scheduler, but sites retain local control • LSF daemons installed on head nodes of remote cluster • Kerberos for user, host and service authentication • Scp for file transfer • Currently deployed in • Sandia National labs to link OpenPBS, PBS Pro and LSF clusters • Singapore national grid to link PBS Pro, LSF and N1GE clusters • Distributed European Infrastructure for Supercomputing Applications (DEISA)
ARC OSG EGEE Job Submission GridFTP GRAM GRAM Service Discovery LDAP/GIIS LDAP/GIIS LDAP/BDII Schema ARC GLUE v1 GLUE v1.2 Storage Transfer Protocol GridFTP GridFTP GridFTP Storage Control Protocol SRM SRM SRM Security GSI/VOMS GSI/VOMS GSI/VOMS Grid interoperability • Many different grids • WLCG, Nordugrid, Teragrid, … • Experiments span the various grids • Short term solutions have to be ad-hoc • Maintain parallel infrastructures by the user, site or both • For the medium term setup adaptors and translators • In the long term adopt common standards and interfaces • Important in security, information, CE, SE • Most grids use X509 standard • Multiple “common” standards … • GIN (Grid interoperability now) group working on some of this
Distributed storage • GridPP organised into 4 regional Tier-2s in the UK • Currently a job follows data into a site • Consider disk at one site as close to cpu at another site • Eg. Disk at Edinburgh vs cpu at Glasgow • Pool resources for efficiency and ease of use • Jobs need to access storage directly from the worker node
RTT between Glasgow and Edinburgh ~ 12 s • Custom rfio client • Normal : One call / read • Readbuf : Fills internal buffer to service request • Readahead : Reads till EOF • Streaming : Separate streams for control & data • Tests using single DPM server • Atlas expects ~ 10 MiB/s / job • Better performance with dedicated light path • Ultimately a single DPM instance to span Glasgow and Edinburgh sites
Data Integrity • Large number of components performing data management in an experiment • Two approaches to checking data integrity • Automatic agents continuously performing checks • Checks in response to special events • Different catalogs in LHCb : Bookkeeping, LFC, SE • Issues seen : • zero size files: • missing replica information: • wrong SAPath • wrong SE host: • wrong protocol • sfn, rfio, bbftp… • mistakes in files registration • blank spaces on the surl path • carriage returns • presence of port number in the surl path..
Summary • Many experiments have embraced the grid • Many interesting challenges ahead • Hardware • Reduce the power consumed by cpu-s • Applications need to manage with lesser RAM • Software • Grid interoperability • Security with generic pilots / glexec • Distributed grid network • And many opportunities • To test solutions to above issues • Stress test the grid infrastructure • Get ready for data taking • Implement lessons in other fields • Biomed … • Note : 1 fully digitised film = 4 PB and needs 1.25 GB/s to play