1 / 25

LHC Computing Grid Project

LHC Computing Grid Project. Creating a Global Virtual Computing Centre for Particle Physics ACAT’2002 27 June 2002 Les Robertson IT Division, CERN les.robertson@cern.ch. Summary. LCG – The LHC Computing Grid Project requirements, funding, creating a Grid areas of work grid technology

avye-pugh
Download Presentation

LHC Computing Grid Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LHC Computing Grid Project Creating a Global Virtual Computing Centre for Particle Physics ACAT’2002 27 June 2002 Les Robertson IT Division, CERN les.robertson@cern.ch

  2. Summary • LCG – The LHC Computing Grid Project • requirements, funding, creating a Grid • areas of work • grid technology • computing fabrics • deployment • operating a grid • Plan for the LCG Global Grid Service • A few remarks

  3. Funding dictates – • Worldwide distributed computing system • Small fraction of the analysis at CERN • Batch analysis – using 12-20 large regional centres • how to use the resources efficiently • establishing and maintaining a uniform physics environment • Data exchange and interactive analysis involving tens of smaller regional centres, universities, labs

  4. Summary - Project Goals Goal –Prepare and deploy the LHC computing environment • applications- tools, frameworks, environment, persistency • computing system global grid service • cluster  automated fabric • collaborating computer centres  grid • CERN-centric analysis  global analysis environment This is not another grid technology project – it is a grid deployment project

  5. Two Phases The first phase of the project – 2002-2005 • preparing the prototype computing environment, including • support for applications – libraries, tools, frameworks, common developments, ….. • global grid computing service • funded by Regional Centres, CERN, special contributions to CERN by member and observer states, middleware developments by national and regional Grid projects • manpower OK • hardware at CERN - ~40% funded • Phase 2 – construction and operation of the initial LHC Computing Service – 2005-2007 • at CERN – missing funding of ~80M CHF

  6. Funding • Funding agencies have little enthusiasm for investing more in particle physics • HEP seen as a ground-breaker in computing • initiator of the Web • track record of exploiting leading edge computing • effective global collaborations • real need – for data as well as computation • one of the few application areas with real cross-border data needs • LHC in sync with -- emergence of Grid technology -- explosion of network bandwidth • We must deliver on Phase 1 for LHC - and show the relevance for other sciences

  7. WAN application servers mass storage data cache Building a Grid Computing Centre Cluster

  8. Cluster  Fabric autonomic computing • automated management • installation, configuration,maintenance, monitoring,error recovery, … • reliability • cost containment

  9. CERN IN2P3 RAL FNAL Tier 1 – full service Uni n Lab a Tier2 Uni b Lab c   Department  Desktop MONARC report: http://home.cern.ch/~barone/monarc/RCArchitecture.html The MONARC Multi-Tier Model (1999) Tier 0 - recording, reconstruction les.robertson@cern.ch

  10. Building a Grid Collaborating Computer Centres

  11. Building a Grid The virtual LHC Computing Centre Grid Collaborating Computer Centres Alice VO CMS VO

  12. Virtual Computing Centre The user --- sees the image of a single cluster does not need to know- where the data is - where the processing capacity is - how things are interconnected - the details of the different hardware and is not concerned by the conflicting policies of the equipment owners and managers

  13. Project Implementation Organisation Four areas • Applications (see Matthias Kasemann’s presentation) • Grid Technology • Fabrics • Grid deployment

  14. GriPhyN PPDG iVDGL Grid Technology AreaLeveraging Grid R&D Projects • significant R&D funding for Grid middleware • risk of divergence • and is that good or bad? • global grids need standards • useful grids need stability • hard to do this in the current state of maturity • will we recognise and be willing to migrate to the winning solutions? European projects Many national, regional Grid projects -- GridPP(UK), INFN-grid(I), NorduGrid, Dutch Grid, … US projects

  15. Grid Technology Area • Ensuring that the appropriate middleware is available • Supplied and maintained by the “Grid projects” • It is proving hard to get the first “production” data intensive grids going as user services • Can the grid projects provide long-term support and maintenance? • Trade-off between new functionality and stability

  16. The Trans-Atlantic Issue • Bridging the ATLANTIC is essential for the project • HICB – High Energy and Nuclear Physics Intergid Collaboration Board GLUE – Grid Laboratory Universal Environment compatible middleware and infrastructure • Funded by DataTAG and iVDGL • Certificates - OK • Schemas – under way, working with the wider Globus world, getting complicated – probably OK • Middleware components – not yet clear – but close collaboration on • File replication • Job scheduling

  17. Collaboration with Grid Projects • LCG must deploy a GLOBAL GRID • essential to havecompatible middleware & grid infrastructure • better – have identical middleware • We are banking on GLUE But we have to make some choices towards the end of the year • Services are about stability, support, maintenance Can the R&D grid projects take commitments for long term maintenance of their middleware?

  18. Scope of Fabric Area • Tier 1,2 centre collaboration • Grid-Fabric integration middleware (DataGrid WP4) • Automated systems management package • Technology assessment (PASTA III) started • CERN Tier 0+1 centre

  19. Grid Deployment Area • The aim is to build • a general computing service • for a very large user population • of independently-minded scientists • using a large number of independently managed sites • This is NOT a collection of sites providing pre-defined services • it is the user’s job that defines the service • it is current research interests that define the workload • it is the workload that defines the data distribution DEMAND - Unpredictable & Chaotic But theSERVICEhad better beAvailable & Reliable

  20. Grid Deployment – current status • Experiments can do (and are doing) their event production using distributed resources with a variety of solutions • classic distributed production – send jobs to specific sites, simple bookkeeping • some use of Globus, and some of the HEP Grid tools • other integrated solutions (ALIEN) • The hard problem for distributed computing is data analysis – ESD and AOD • chaotic workload • unpredictable data access patterns this is where new Grid technology is needed resource broker, replica management, .. this is the problem that the LCG has to solve

  21. queries monitoring & alarms corrective actions Grid Operation User Local site Local user support Local operation Call Centre Grid Operations Centre Grid information service Grid operations Grid logging & bookkeeping Virtual Organisation Network Operations Centre

  22. Grid Operation • We do not know how to do this • Probably nobody knows – looks like network operation, but there are many more variables to be watched and adjusted;looks like multi-national commercial systems, but we have no central ownership, control • A 24 hour service is needed – round the clock and round the world

  23. Setting up the LHC Global Grid Service • First data is in 2007 • LCG must learn from current solutions, leverage the tools coming from the grid projects, show that grids are useful but set realistic targets  short term (this year): • use current solutions for physics data challenges (event productions) • consolidate (stabilise, maintain) middleware • learn what a “production grid” really means by working with DataGrid and VDT  medium term (next year): • Set up a reliable global grid service – initially only a few larger centres, but on three continents • Stabilise it • Several times the capacity of the CERN facility and as easy to use

  24. Having stabilised this base service – showing that we can run a solid service for the experiments then – progressive evolution – • integrate all of the Regional Centre resources provided for LHC • improve quality, reliability, predictability • integrate new middleware functionality – possibly once per year • migrate to de facto standards as soon as they emerge

  25. Final comments • It is not just about distributing computation, it is also about managing distributed data (lots of it!) and maintaining a single view of the environment • All these parallel developments, rapidly changing technology .. may be good in the long term, but we must deploy a global grid service next year • A dependable, reliable 24 X 7 service is essential and not so easy to do with all these sites and all that data • ServiceQualityis the Key to Acceptance of Grids • Reliable OPERATIONwill be the factor that limits the size of practical Grids • We are getting funding because of the relevance for other sciences, engineering, business -- keeping things general, main-line must remain a high priority

More Related