460 likes | 610 Views
LCG Project Status & Plans (with an emphasis on applications software). Torre Wenaus, BNL/CERN LCG Applications Area Manager http://cern.ch/lcg/peb/applications US ATLAS PCAP Review November 14, 2002. The LHC Computing Grid (LCG) Project. Approved (3 years) by CERN Council, September 2001
E N D
LCG Project Status & Plans(with an emphasis on applications software) Torre Wenaus, BNL/CERN LCG Applications Area Manager http://cern.ch/lcg/peb/applications US ATLAS PCAP Review November 14, 2002
The LHC Computing Grid (LCG) Project • Approved (3 years) by CERN Council, September 2001 • meanwhile extended by 1 year due to LHC delay • Injecting substantial new facilities and personnel resources • Scope: • Common software for physics applications • Tools, frameworks, analysis environment • Computing for the LHC • Computing facilities (fabrics) • Grid middleware, deployment Deliver a global analysis environment Goal – Prepare and deploy the LHC computing environment
Goal of the LHC Computing Grid Project - LCG Phase 1 – 2002-05development of common applications, libraries, frameworks, prototyping of the environment, operation of a pilot computing service Phase 2 – 2006-08acquire, build and operate the LHC computing service
Non-CERN Hardware Need • CERN will provide the data reconstruction & recording service (Tier 0)-- but only a small part of the analysis capacity • current planning for capacity at CERN + principal Regional Centres • 2002: 650 KSI2000 <1% of capacity required n 2008 • 2005: 6,600 KSI2000 < 10% of 2008 capacity
LHC Manpower needs for Core Software From LHC Computing Review (FTEs) Only computing professionals counted
RTAG WP WP WP WP WP The LHC Computing Grid Project Structure Project Overview Board Software andComputingCommittee(SC2) Project Leader ProjectExecutionBoard (PEB) Requirements, Work plan, Monitoring GridProjects Project Work Packages
Funding LCG • National funding of Computing Services at Regional Centres • Funding from the CERN base budget • Special contributions of people and materials at CERN during Phase I of the project • Germany, United Kingdom, Italy, …… • Grid projects • Institutes taking part in applications common projects • Industrial collaboration • Institutes providing grid infrastructure services • operations centre, user support, training, ……
Project Execution Board Decision taking –- as close as possible to the work - by those who will be responsible for the consequences Two bodies set up to coordinate & take decisions • Architects Forum • software architect from each experiment and the application area manager • makes common design decisions and agreements between experiments in the applications area • supported by a weekly applications area meeting open to all participants • Grid Deployment Board • representatives from the experiments and from each country with an active Regional Centre taking part in the LCG Grid Service • forges the agreements, takes the decisions, defines the standards and policies that are needed to set up and manage the LCG Global Grid Services • coordinates the planning of resources for physics and computing data challenges
LCG Areas of Work Physics Applications Software • Application Software Infrastructure – libraries, tools • Object persistency, data management tools • Common Frameworks – Simulation, Analysis, .. • Adaptation of Physics Applications to Grid environment • Grid tools, Portals Grid Deployment • Data Challenges • Grid Operations • Network Planning • Regional Centre Coordination • Security & access policy Fabric (Computing System) • Physics Data Management • Fabric Management • Physics Data Storage • LAN Management • Wide-area Networking • Security • Internet Services Grid Technology • Grid middleware • Standard application services layer • Inter-project coherence/compatibility
Fabric Area • CERN Tier 0+1 centre • Automated systems management package • Evolution & operation of CERN prototype – integration into the LCG grid • Tier 1,2 centre collaboration • develop/share experience on installing and operating a Grid • exchange information on planning and experience of large fabric management • look for areas for collaboration and cooperation • Technology tracking & costing • new technology assessment nearing completion • re-costing of Phase II will be done next year
Grid Technology (GTA) in LCG Quote from recent Les Robertson slide: LCG expects to obtain Grid Technology from projects funded by national and regionale-science initiatives -- and from industry concentrating ourselves ondeploying a global grid service All true, but there is a real role for GTA, not just deployment, in LCG: Ensuring that the needed middleware is/will be there, tested, selected and of production grade Message seems to be getting through; (re)organization in progress to create an active GTA Has been dormant and subject to EDG conflict of interest up to now New leader David Foster, CERN
GriPhyN PPDG iVDGL A few of the Grid Projects with strong HEP collaboration European projects Many national, regional Grid projects -- GridPP(UK), INFN-grid(I), NorduGrid, Dutch Grid, … US projects
Grid Technology Area This area of the project is concerned with • ensuring that the LCG requirements are known to current and potential Grid projects • active lobbying for suitable solutions – influencing plans and priorities • evaluating potential solutions • negotiating support for tools developed by Grid projects • developing a plan to supply solutions that do not emerge from other sources • BUT this must be done with caution – important to avoid HEP-SPECIAL solutions important to migrate to standards as they emerge(avoid emotional attachment to prototypes)
Grid Technology Status • A base set of requirements has been defined (HEPCAL) • 43 use cases • ~2/3 of which should be satisfied ~2003 by currently funded projects • Good experience of working with Grid projects in Europe and the United States • Practical results from testbeds used for physics simulation campaigns • Built on the Globus toolkit • GLUE initiative –working on integration of the two main HEP Grid project groupings – • around the (European) DataGrid project • subscribing to the (US) Virtual Data Toolkit - VDT
Grid Deployment Area • New leader Ian Bird, CERN, formerly Jefferson Lab • Job is to set up and operate a Global Grid Service • stable, reliable, manageable Grid for – Data Challenges and regular production work • integrating computing fabrics at Regional Centres • learn how to provide support, maintenance, operation • Short term (this year): • consolidate (stabilize, maintain) middleware – and see it used for some physics • learn what a “production grid” really means by working with the Grid R&D projects
Grid Deployment Board • Computing management of regional centers and experiments • Decision forum for planning, deploying and operating the LCG grid • Service and resource scheduling and planning • Registration, authentication, authorization, security • LCG Grid operations • LCG Grid user support • One person from each country with an active regional center • Typically senior manager of a regional center • This role currently admixed with [inappropriately] broader responsibility • Middleware selection • For political/expediency reasons (dormancy of GTA) • Not harmful because it is being led well (David Foster) • Should be corrected during/after LCG-1
Medium term (next year): Priority: Move from testbeds to a reliable service Target June 03 - deploy a Global Grid Service (LCG-1) • sustained 24 X 7 service • including sites from three continents identical or compatible Grid middleware and infrastructure • several times the capacity of the CERN facility • and as easy to use Having stabilised this base service – progressive evolution – • number of nodes, performance, capacity and quality of service • integrate new middleware functionality • migrate to de facto standards as soon as they emerge
Centers taking part in LCG-1 Tier 1 Centres • FZK Karlsruhe, CNAF Bologna, Rutherford Appleton Lab (UK), IN2P3 Lyon, University of Tokyo, Fermilab, Brookhaven National Lab Other Centres • GSI, Moscow State University, NIKHEF Amsterdam, Academica Sinica (Taipei), NorduGrid, Caltech, University of Florida, Ohio Supercomputing Centre, Torino, Milano, Legnaro, ……
LCG-1 as a service for LHC experiments • Mid-2003 • 5-10 of the larger regional centres • available as one of the services used for simulation campaigns • 2H03 • add more capacity at operational regional centres • add more regional centres • activate operations centre, user support infrastructure • Early 2004 • principal service for physics data challenges
Applications Area Projects • Software Process and Infrastructure (operating) • Librarian, QA, testing, developer tools, documentation, training, … • Persistency Framework (operating) • POOL hybrid ROOT/relational data store • Mathematical libraries (operating) • Math and statistics libraries; GSL etc. as NAGC replacement • Core Tools and Services (just launched) • Foundation and utility libraries, basic framework services, system services, object dictionary and whiteboard, grid enabled services • Physics Interfaces (being initiated) • Interfaces and tools by which physicists directly use the software. Interactive (distributed) analysis, visualization, grid portals • Simulation (coming soon) • Geant4, FLUKA, virtual simulation, geometry description & model, … • Generator Services (coming soon) • Generator librarian, support, tool development
Applications Area Organization Apps Area Leader Architects Forum Overall management, coordination, architecture Project Project Project Project Leaders … Work Package Leaders WP WP WP WP WP WP WP Direct technical collaboration between experiment participants, IT, EP, ROOT, LCG personnel
Candidate RTAG timeline from March Blue: RTAG/activity launched or (light blue) imminent
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 2002 2003 2004 2005 LCG Applications Area Timeline Highlights POOL V0.1 internal release Architectural blueprint complete Applications Hybrid Event Store available for general users Distributed production using grid services Distributed end-user interactive analysis Full Persistency Framework LCG TDR LCG “50% prototype” (LCG-3) LCG-1 reliability and performance targets First Global Grid Service (LCG-1) available LCG launch week
Personnel status • 15 new LCG hires in place and working; a few more soon • Manpower ramp is on schedule • Contributions from UK, Spain, Switzerland, Germany, Sweden, Israel, Portugal, US • ~10 FTEs from IT (DB and API groups) also participating • ~7 FTEs from experiments (CERN EP and outside CERN) also participating, primarily in persistency project at present • Important experiment contributions also in the RTAG process
Software Architecture Blueprint RTAG • Established early June 2002 • Goals • Integration of LCG and non LCG software to build coherent applications • Provide the specifications of an architectural model that allows this, i.e. a ‘blueprint’ • Mandate • Define the main domains and identify the principal components • Define the architectural relationships between these ‘frameworks’ and components, identify the main requirements for their inter-communication, and suggest possible first implementations. • Identify the high level deliverables and their order of priority. • Derive a set of requirements for the LCG
Architecture Blueprint Report • Executive summary • Response of the RTAG to the mandate • Blueprint scope • Requirements • Use of ROOT • Blueprint architecture design precepts • High level architectural issues, approaches • Blueprint architectural elements • Specific architectural elements, suggested patterns, examples • Domain decomposition • Schedule and resources • Recommendations After 14 RTAG meetings, much email... A 36-page final report Accepted by SC2 October 11 http://lcgapp.cern.ch/project/blueprint/BlueprintReport-final.doc http://lcgapp.cern.ch/project/blueprint/BlueprintPlan.xls
Architecture requirements • Long lifetime: support technology evolution • Languages: LCG core sw in C++ today; support language evolution • Seamless distributed operation • TGV and airplane work: usability off-network • Modularity of components • Component communication via public interfaces • Interchangeability of implementations
Architecture Requirements (2) • Integration into coherent framework and experiment software • Design for end-user’s convenience more than the developer’s • Re-use existing implementations • Software quality at least as good as any LHC experiment • Meet performance, quality requirements of trigger/DAQ software • Platforms: Linux/gcc, Linux/icc, Solaris, Windows
Applications Reconstruction Framework Visualization Framework Simulation Framework OtherFrameworks . . . Basic Framework Foundation Libraries Optional Libraries Software Structure ROOT, Qt, … Implementation- neutral services Grid middleware, … STL, ROOT libs, CLHEP, Boost, …
Component Model • Granularity driven by component replacement criteria; development team organization; dependency minimization • Communication via public interfaces • Plug-ins • Logical module encapsulating a service that can be loaded, activated and unloaded at run time • APIs targeted not only to end-users but to embedding frameworks and internal plug-ins
Distributed Operation • Architecture should enable but not require the use of distributed resources via the Grid • Configuration and control of Grid-based operation via dedicated services • Making use of optional grid middleware services at the foundation level of the software structure • Insulating higher level software from the middleware • Supporting replaceability • Apart from these services, Grid-based operation should be largely transparent • Services should gracefully adapt to ‘unplugged’ environments • Transition to ‘local operation’ modes, or fail informatively
Managing Objects • Object Dictionary • To query a class about its internal structure (Introspection) • Essential for persistency, data browsing, interactive rapid prototyping, etc. • The ROOT team and LCG plan to develop and converge on a common dictionary (common interface and implementation) with an interface anticipating a C++ standard (XTI) • To be used by LCG, ROOT and CINT • Timescale ~1 year • Object Whiteboard • Uniform access to application-defined transient objects
Other Architectural Elements • Python-based Component Bus • Plug-in integration of components providing a wide variety of functionality • Component interfaces to bus derived from their C++ interfaces • Scripting Languages • Python and CINT (ROOT) to both be available • Access to objects via object whiteboard in these environments • Interface to the Grid • Must support convenient, efficient configuration of computing elements with all needed components
(LHCb) Example of LCG–Experiment SW Mapping LCG CLS LCG Pool LCG CLS LCG CLS LCG DDD HepPDT Other LCG services LCG Pool
Domain Decomposition Products mentioned are examples; not a comprehensive list Grey: not in common project scope (also event processing framework, TDAQ)
Use of ROOT in LCG Software • Among the LHC experiments • ALICE has based its applications directly on ROOT • The 3 others base their applications on components with implementation-independent interfaces • Look for software that can be encapsulated into these components • All experiments agree that ROOT is an important element of LHC software • Leverage existing software effectively and do not unnecessarily reinvent wheels • Therefore the blueprint establishes a user/provider relationship between the LCG applications area and ROOT • Will draw on a great ROOT strength: users are listened to very carefully! • The ROOT team has been very responsive to needs for new and extended functionality coming from the persistency effort
SPI Math libraries Physics interfaces Generator services Simulation CoreToolsS&Services POOL Personnel Resources – Required and Available Estimate of Required Effort 60 50 Now 40 FTEs 30 20 10 0 Jun-03 Jun-04 Mar-03 Mar-04 Mar-05 Dec-03 Sep-04 Dec-04 Dec-02 Sep-03 Sep-02 We are OK Quarter ending Blue = Available effort: FTEs today: 15 LCG, 10 CERN IT, 7 CERN EP + experiments Future estimate: 20 LCG, 13 IT, 28 EP + experiments
RTAG Conclusions and Recommendations • Use of ROOT as described • Start common project on core tools and services • Start common project on physics interfaces • Start RTAG on analysis, including distributed aspects • Tool/technology recommendations • CLHEP, CINT, Python, Qt, AIDA, … • Develop a clear process for adopting third party software
Core Libraries and Services Project • Pere Mato (CERN/LHCb) is leading the new Core Libraries and Services (CLS) Project • Project being launched now, developing immediate plans over the next week or so and a full work plan over the next couple of months • Scope: • Foundation, utility libraries • Basic framework services • Object dictionary • Object whiteboard • System services • Grid enabled services • Many areas of immediate relevance to POOL • Clear process for adopting third party libraries will be addressed early in this project
Physics Interfaces Project • Launching it now, led by Vincenzo Innocente (CMS) • Covers the interfaces and tools by which physicists will directly use the software • Should be treated coherently, hence coverage by a single project • Expected scope once analysis RTAG concludes: • Interactive environment • Analysis tools • Visualization • Distributed analysis • Grid portals
POOL • Pool of persistent objects for LHC, currently in prototype • Targeted at event data but not excluding other data • Hybrid technology approach • Object level data storage using file-based object store (ROOT) • RDBMS for meta data: file catalogs, object collections, etc (MySQL) • Leverages existing ROOT I/O technology and adds value • Transparent cross-file and cross-technology object navigation • RDBMS integration • Integration with Grid technology (eg EDG/Globus replica catalog) • network and grid decoupled working modes • Follows and exemplifies the LCG blueprint approach • Components with well defined responsibilities • Communicating via public component interfaces • Implementation technology neutral
Pool Release Schedule • End September - V0.1 (Released on schedule) • All core components for navigation exist and interoperate • Assumes ROOT object (TObject) on read and write • End October - V0.2 • First collection implementation • End November - V0.3 (First public release) • EDG/Globus FileCatalog integrated • Persistency for general C++ classes (not instrumented by ROOT) • Event meta data annotation and query • June 2003 – Production release
Four Experiments, Four Viewpoints, Two Paths • Four viewpoints (of course) among the experiments; two basic positions • ATLAS, CMS, LHCb similar views; ALICE differing • The blueprint establishes the basis for a good working relationship among all • LCG applications software is developed according to the blueprint • To be developed and used by ATLAS, CMS and LHCb, with ALICE contributing mainly via ROOT • ALICE continues to develop their line making direct use of ROOT as their software framework
What next in LCG? Much to be done to be done in middleware functionality - Job Scheduling, Workflow Management, Database access, Replica Management, Monitoring, Global Resource Optimisation, Evolution to Web Services, …. But we also need an evolution from Research & Development Engineering Effective use of high-bandwidth Wide Area Networks work on protocols, file transfer High quality computer centre services high quality Grid services sociology as well as technology
Concluding Remarks • Good engagement and support from the experiments and CERN • Determination to make the LCG work • LCG organizational structure seems to be working, albeit slow • Facilities money problematic; manpower on track • Grid Technology and Grid Deployment areas still in flux • Important for US to make sure interests are represented and experience injected • Applications area going well • Architecture laid out • Working relationship among the four experiments • Many common projects • First software deliverable (POOL) on schedule