250 likes | 371 Views
LHC Computing Grid Project. Creating a Global Virtual Computing Centre for Particle Physics ACAT’2002 27 June 2002 Les Robertson IT Division, CERN les.robertson@cern.ch. Summary. LCG – The LHC Computing Grid Project requirements, funding, creating a Grid areas of work grid technology
E N D
LHC Computing Grid Project Creating a Global Virtual Computing Centre for Particle Physics ACAT’2002 27 June 2002 Les Robertson IT Division, CERN les.robertson@cern.ch
Summary • LCG – The LHC Computing Grid Project • requirements, funding, creating a Grid • areas of work • grid technology • computing fabrics • deployment • operating a grid • Plan for the LCG Global Grid Service • A few remarks
Funding dictates – • Worldwide distributed computing system • Small fraction of the analysis at CERN • Batch analysis – using 12-20 large regional centres • how to use the resources efficiently • establishing and maintaining a uniform physics environment • Data exchange and interactive analysis involving tens of smaller regional centres, universities, labs
Summary - Project Goals Goal –Prepare and deploy the LHC computing environment • applications- tools, frameworks, environment, persistency • computing system global grid service • cluster automated fabric • collaborating computer centres grid • CERN-centric analysis global analysis environment This is not another grid technology project – it is a grid deployment project
Two Phases The first phase of the project – 2002-2005 • preparing the prototype computing environment, including • support for applications – libraries, tools, frameworks, common developments, ….. • global grid computing service • funded by Regional Centres, CERN, special contributions to CERN by member and observer states, middleware developments by national and regional Grid projects • manpower OK • hardware at CERN - ~40% funded • Phase 2 – construction and operation of the initial LHC Computing Service – 2005-2007 • at CERN – missing funding of ~80M CHF
Funding • Funding agencies have little enthusiasm for investing more in particle physics • HEP seen as a ground-breaker in computing • initiator of the Web • track record of exploiting leading edge computing • effective global collaborations • real need – for data as well as computation • one of the few application areas with real cross-border data needs • LHC in sync with -- emergence of Grid technology -- explosion of network bandwidth • We must deliver on Phase 1 for LHC - and show the relevance for other sciences
WAN application servers mass storage data cache Building a Grid Computing Centre Cluster
Cluster Fabric autonomic computing • automated management • installation, configuration,maintenance, monitoring,error recovery, … • reliability • cost containment
CERN IN2P3 RAL FNAL Tier 1 – full service Uni n Lab a Tier2 Uni b Lab c Department Desktop MONARC report: http://home.cern.ch/~barone/monarc/RCArchitecture.html The MONARC Multi-Tier Model (1999) Tier 0 - recording, reconstruction les.robertson@cern.ch
Building a Grid Collaborating Computer Centres
Building a Grid The virtual LHC Computing Centre Grid Collaborating Computer Centres Alice VO CMS VO
Virtual Computing Centre The user --- sees the image of a single cluster does not need to know- where the data is - where the processing capacity is - how things are interconnected - the details of the different hardware and is not concerned by the conflicting policies of the equipment owners and managers
Project Implementation Organisation Four areas • Applications (see Matthias Kasemann’s presentation) • Grid Technology • Fabrics • Grid deployment
GriPhyN PPDG iVDGL Grid Technology AreaLeveraging Grid R&D Projects • significant R&D funding for Grid middleware • risk of divergence • and is that good or bad? • global grids need standards • useful grids need stability • hard to do this in the current state of maturity • will we recognise and be willing to migrate to the winning solutions? European projects Many national, regional Grid projects -- GridPP(UK), INFN-grid(I), NorduGrid, Dutch Grid, … US projects
Grid Technology Area • Ensuring that the appropriate middleware is available • Supplied and maintained by the “Grid projects” • It is proving hard to get the first “production” data intensive grids going as user services • Can the grid projects provide long-term support and maintenance? • Trade-off between new functionality and stability
The Trans-Atlantic Issue • Bridging the ATLANTIC is essential for the project • HICB – High Energy and Nuclear Physics Intergid Collaboration Board GLUE – Grid Laboratory Universal Environment compatible middleware and infrastructure • Funded by DataTAG and iVDGL • Certificates - OK • Schemas – under way, working with the wider Globus world, getting complicated – probably OK • Middleware components – not yet clear – but close collaboration on • File replication • Job scheduling
Collaboration with Grid Projects • LCG must deploy a GLOBAL GRID • essential to havecompatible middleware & grid infrastructure • better – have identical middleware • We are banking on GLUE But we have to make some choices towards the end of the year • Services are about stability, support, maintenance Can the R&D grid projects take commitments for long term maintenance of their middleware?
Scope of Fabric Area • Tier 1,2 centre collaboration • Grid-Fabric integration middleware (DataGrid WP4) • Automated systems management package • Technology assessment (PASTA III) started • CERN Tier 0+1 centre
Grid Deployment Area • The aim is to build • a general computing service • for a very large user population • of independently-minded scientists • using a large number of independently managed sites • This is NOT a collection of sites providing pre-defined services • it is the user’s job that defines the service • it is current research interests that define the workload • it is the workload that defines the data distribution DEMAND - Unpredictable & Chaotic But theSERVICEhad better beAvailable & Reliable
Grid Deployment – current status • Experiments can do (and are doing) their event production using distributed resources with a variety of solutions • classic distributed production – send jobs to specific sites, simple bookkeeping • some use of Globus, and some of the HEP Grid tools • other integrated solutions (ALIEN) • The hard problem for distributed computing is data analysis – ESD and AOD • chaotic workload • unpredictable data access patterns this is where new Grid technology is needed resource broker, replica management, .. this is the problem that the LCG has to solve
queries monitoring & alarms corrective actions Grid Operation User Local site Local user support Local operation Call Centre Grid Operations Centre Grid information service Grid operations Grid logging & bookkeeping Virtual Organisation Network Operations Centre
Grid Operation • We do not know how to do this • Probably nobody knows – looks like network operation, but there are many more variables to be watched and adjusted;looks like multi-national commercial systems, but we have no central ownership, control • A 24 hour service is needed – round the clock and round the world
Setting up the LHC Global Grid Service • First data is in 2007 • LCG must learn from current solutions, leverage the tools coming from the grid projects, show that grids are useful but set realistic targets short term (this year): • use current solutions for physics data challenges (event productions) • consolidate (stabilise, maintain) middleware • learn what a “production grid” really means by working with DataGrid and VDT medium term (next year): • Set up a reliable global grid service – initially only a few larger centres, but on three continents • Stabilise it • Several times the capacity of the CERN facility and as easy to use
Having stabilised this base service – showing that we can run a solid service for the experiments then – progressive evolution – • integrate all of the Regional Centre resources provided for LHC • improve quality, reliability, predictability • integrate new middleware functionality – possibly once per year • migrate to de facto standards as soon as they emerge
Final comments • It is not just about distributing computation, it is also about managing distributed data (lots of it!) and maintaining a single view of the environment • All these parallel developments, rapidly changing technology .. may be good in the long term, but we must deploy a global grid service next year • A dependable, reliable 24 X 7 service is essential and not so easy to do with all these sites and all that data • ServiceQualityis the Key to Acceptance of Grids • Reliable OPERATIONwill be the factor that limits the size of practical Grids • We are getting funding because of the relevance for other sciences, engineering, business -- keeping things general, main-line must remain a high priority