160 likes | 175 Views
Learn about the LCG project and its goals to prepare and deploy the global grid service for the LHC computing environment. Collaboration between regional centers and common solutions are crucial for the success of this grid deployment project.
E N D
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002
What is LCG? Why is it relevant to HEPiX? Ian.Bird@cern.ch
LCG Project Goals Goal –Prepare and deploy the LHC computing environment • applications- tools, frameworks, environment, persistency • computing system global grid service • cluster automated fabric • collaborating computer centres grid • CERN-centric analysis global analysis environment • central role of data challenges This is not another grid technology project – it is a grid deployment project Ian.Bird@cern.ch
LCG Level 1 Milestonesproposed to LHCC Ian.Bird@cern.ch
LCG and its interactions GTA Common Applications Deployment Fabric Experiments Grid Projects HEPCAL PPDG iVDGL (VDT) GriPhyN Globus GLUE EDG NorduGrid GDB AliEn Regional Centres CERN Ian.Bird@cern.ch
Multi-dimensional problem • Regional Centres: • Host one or more experiments • Different RC’s deploy different grid middleware in existing testbeds • Have different operational and security policies • Experiments: • Use middleware from various grid projects • Run at many regional centres • Provide applications that rely on specific middleware • Grid projects: • Provide middleware – that does not often (yet) interoperate • Starting to collaborate on common solutions and interoperability The Deployment area of LCG ties these all together Ian.Bird@cern.ch
Grid Deployment – goals of LCG-1 • Production service for Data Challenges in 2H03 & 2004 • Focused on batch production work • Experience in close collaboration between the Regional Centres • Should have wide enough participation to understand the issues, but not too many initially • Learn how to maintain and operate a global grid • Focus on a production-quality service and all that implies • Robustness, fault-tolerance, predictability, and supportability take precedence over functionality • But – minimum functionality to be of value • This requires: • a middleware support group with integration, certification, testing, packaging etc. responsibilities • A support structure • LCG should be integrated into the sites’ physics computing services – should not be something apart • This requires coordination between participating sites in: • Policies and collaborative agreements • Resource planning and scheduling • Operations • Support Ian.Bird@cern.ch
What might LCG-1 look like? • User’s perspective: - requires • Functionality adequate to provide advantage over not using distributed model • Straightforward to use – • Well defined services • Advice on how to use the system • Help with problems • Failures should be understandable • Ability to determine status of jobs and data • Sites’ perspective: • Integrated into computer centre/IT (inc. security) infrastructures • Able to support service • Able to allocate and manage resources – local autonomy where needed • Overall service perspective: • Performance and problem monitoring • Accounting • Etc. Ian.Bird@cern.ch
LCG has to build the “virtual computer centre” (= LHC computing environment) • With all that is expected from a production service • User support • Operations group • “Account” management • Security • Fabric management • Etc.. • Except this is now distributed across many countries and continents • Requires agreements, collaboration, and coordination • At all levels: management, system managers, user support, etc. Ian.Bird@cern.ch
Grid Operation queries monitoring & alarms corrective actions User Local user support Local operation Local site Call Centre Grid Operations Centre Grid information service Grid operations Grid logging & bookkeeping Virtual Organisation Network Operations Centre Ian.Bird@cern.ch
Deployment Summary • Deploy middleware to support essential functionality, but goal is to evolve and incrementally add functionality • Added value is to robustify, support and make into a 24x7 production service • How? • Certification & test procedure – tight feedback to developers • must develop support agreements with grid projects to ensure this • Define missing functionality – require from providers • Provide documentation and training • Provide missing operational services • Provide a 24x7 Operations and Call Centre • Guarantee to respond • Single point of contact for a user • Make software easy to install – facilitate new centres joining Ian.Bird@cern.ch
LCG Strategy • Develop as little as possible • Use existing middleware, tools and software • Pressure developers to provide missing functionality • Negotiate support agreements • Leverage existing experience • Various data grid projects and testbeds • Teragrid, interoperability demonstrations, GGF – production grids area • Actively encourage collaboration and coordination Ian.Bird@cern.ch
Grid Deployment Teams – the plan HEPiX interests suppliers’ integration teams provide tested releases common applications s/w Trillium - US grid middleware DataGrid middleware certification, build & distribution LCG infrastructure coordination & operation user support grid operation call centre LCG … fabric operation regional centre A fabric operation regional centre B fabric operation regional centre X fabric operation regional centre Y Ian.Bird@cern.ch
Coordination & Collaboration • There are many opportunities for common solutions, which should be actively pursued • HICB – JTB, existing & proposed new collaborative activities • GLUE • Schema definitions & interoperability work • Validation and Test Suites • Distribution and Meta-Packaging • Interoperable distribution and configuration utilities identified as a definite need by all the recent trans-Atlantic demonstration and validation work. • Support for this group comes from: • LCG, EDG, EDT, Trillium, DataTAG • Security czars • Already talking to address grid issues • GGF • Production grids • AAA • Etc. • LCG – grid deployment board, etc. Ian.Bird@cern.ch
Summary of Issues that might be addressed by HEPiX/LCCWS • I know many of these are discussed by a plethora of grid projects and offshoots, but remember, more than ever before we all have to work together coherently to make a grid work: • Grid operations centre: Teragrid, iVDGL • User support – • distributed helpdesk/call centre: iVDGL, Teragrid, Nordic grid collabs, GGF production grids area • Helpdesk tools • Certification process for operating environments • Upgrade procedures • Configuration management • Joint OS version certification • Packaging, installation – inc applications • User management • Security etc. • Fabric management (see LCCWS) • Etc. Ian.Bird@cern.ch
Proposal • HEPiX is already (a lot of) the right people • Already, or soon to be, deploying LCG and other grids in their computer centres • Keep LCCWS associated with HEPiX • Add a Grid Coordination/LCG interest group – like HEPNT or Storage • To address themes and issues of common interest • Encourage new people to attend • Line up specific talks by selected people to address issues and to propose activities to follow on • We need to solve the problems – not just talk about them • Needs a coordinator & agenda to make sure this happens – • Volunteers? Ian.Bird@cern.ch