1 / 18

Report on CHEP 2007

Report on CHEP 2007. Raja Nandakumar. Synopsis. Two classes of talks and posters Computer hardware Dominated by cooling / power consumption Mostly in the plenary sessions Software Grid job workload management systems Job submission by the experiments Site Job handling, monitoring

Download Presentation

Report on CHEP 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Report on CHEP 2007 Raja Nandakumar

  2. Synopsis • Two classes of talks and posters • Computer hardware • Dominated by cooling / power consumption • Mostly in the plenary sessions • Software • Grid job workload management systems • Job submission by the experiments • Site Job handling, monitoring • Grid operations (Monte Carlo production, glexec, interoperability, …) • Data integrity checking • …. • Storage systems • Primarily concerning dCache and DPM • Distributed storage systems • Parallel session : Grid middleware and tools

  3. Computing hardware • Power requirements of LHC computing • Important for running costs • ~330W to provision for 100W of electronics • Some sites running with air or water cooled racks

  4. High performance and multi-core computing • Core Frequencies ~ 2-4 GHz, will not change significantly • Power • 1,000,000 cores at 25 W / core = 25 MW • Just for the cpu • Have to bring core power down by multiple orders of magnitude • reduces chip frequency, complexity, capability • Memory Bandwidth • As we add cores to a chip, it is increasingly difficulty to provide sufficient memory bandwidth • Application tuning to manage memory bandwidth becomes critical • Network and I/O Bandwidth, data integrity, reliability • A Petascale computer will have Petabytes of Memory • Current Single File Servers achieve 2-4 GB/s • 70+ hours to checkpoint 1 Petabyte • IO management is a major challenge • Memory Cost • Can’t expect to maintain current memory / core numbers at petascale. • 2GB/core for ATLAS / CMS

  5. Grid job submission • Most new developments were on pilot agent based grid systems • Implement job scheduling based on “pull” scheduling paradigm • The only method for grid job submission LHCb • DIRAC (> 3 years experience) • Ganga is the user analysis front end • Also used in Alice (and Panda and Magic) • AliEn since 2001 • Used for production, user analysis, data management in LHCb & Alice • New developments for others • Panda : Atlas, Charmm • Central server based on Apache • GlideIn : Atlas, CMS, CDF • Based on Condor • Used for production and analysis • Very successful implementations • Real-time view of the local environment • Pilot agents can have some intelligence built into the system • Useful for heterogeneous computing environment • Recently Panda to be used for all Atlas production • One talk on distributed batch systems

  6. Pilot agents • Pilot agents submitted on demand • Reserve the resource for immediate use • Allows checking of the environment before job scheduling • Only bidirectional network traffic • Unidirectional connectivity • Terminates gracefully if no work is available • Also called GlideIn-s • LCG jobs are essentially pilot jobs for the experiment

  7. DIRAC WMS

  8. Panda WMS

  9. Alice (AliEn / MonaLisa)History plot of running jobs

  10. LHCb (Dirac)Max running jobs snapshot

  11. Glexec • A thin layer to change Unix domain credentials based on grid identity and attribute information • Different modes of operation • With or without setuid • Ability to change the user id of the final job • Enable VO to • Internally manage job scheduling and prioritisation • Late binding of user jobs to pilots • In production at Fermilab • Code ready and tested, awaiting full audit

  12. Job Scheduler Web Portal LSF Scheduler MultiCluster LSF Scheduler Cluster/Desktops Cluster/Desktops LSF PBS SGE CCE LSF universus

  13. LSF universus • Commercial extension of LSF • Interface to multiple clusters • Centralised scheduler, but sites retain local control • LSF daemons installed on head nodes of remote cluster • Kerberos for user, host and service authentication • Scp for file transfer • Currently deployed in • Sandia National labs to link OpenPBS, PBS Pro and LSF clusters • Singapore national grid to link PBS Pro, LSF and N1GE clusters • Distributed European Infrastructure for Supercomputing Applications (DEISA)

  14. ARC OSG EGEE Job Submission GridFTP GRAM GRAM Service Discovery LDAP/GIIS LDAP/GIIS LDAP/BDII Schema ARC GLUE v1 GLUE v1.2 Storage Transfer Protocol GridFTP GridFTP GridFTP Storage Control Protocol SRM SRM SRM Security GSI/VOMS GSI/VOMS GSI/VOMS Grid interoperability • Many different grids • WLCG, Nordugrid, Teragrid, … • Experiments span the various grids • Short term solutions have to be ad-hoc • Maintain parallel infrastructures by the user, site or both • For the medium term setup adaptors and translators • In the long term adopt common standards and interfaces • Important in security, information, CE, SE • Most grids use X509 standard • Multiple “common” standards … • GIN (Grid interoperability now) group working on some of this

  15. Distributed storage • GridPP organised into 4 regional Tier-2s in the UK • Currently a job follows data into a site • Consider disk at one site as close to cpu at another site • Eg. Disk at Edinburgh vs cpu at Glasgow • Pool resources for efficiency and ease of use • Jobs need to access storage directly from the worker node

  16. RTT between Glasgow and Edinburgh ~ 12 s • Custom rfio client • Normal : One call / read • Readbuf : Fills internal buffer to service request • Readahead : Reads till EOF • Streaming : Separate streams for control & data • Tests using single DPM server • Atlas expects ~ 10 MiB/s / job • Better performance with dedicated light path • Ultimately a single DPM instance to span Glasgow and Edinburgh sites

  17. Data Integrity • Large number of components performing data management in an experiment • Two approaches to checking data integrity • Automatic agents continuously performing checks • Checks in response to special events • Different catalogs in LHCb : Bookkeeping, LFC, SE • Issues seen : • zero size files: • missing replica information: • wrong SAPath • wrong SE host: • wrong protocol • sfn, rfio, bbftp… • mistakes in files registration • blank spaces on the surl path • carriage returns • presence of port number in the surl path..

  18. Summary • Many experiments have embraced the grid • Many interesting challenges ahead • Hardware • Reduce the power consumed by cpu-s • Applications need to manage with lesser RAM • Software • Grid interoperability • Security with generic pilots / glexec • Distributed grid network • And many opportunities • To test solutions to above issues • Stress test the grid infrastructure • Get ready for data taking • Implement lessons in other fields • Biomed … • Note : 1 fully digitised film = 4 PB and needs 1.25 GB/s to play

More Related