640 likes | 784 Views
The LHC Computing Project Common Solutions for the LHC. ACAT 2002 Presented by Matthias Kasemann FNAL and CERN. Outline. The LCG Project: goal and organization Common solutions: Why common solutions How to … The Run2 common projects The LCG Project: status of planning
E N D
The LHC Computing ProjectCommon Solutions for the LHC ACAT 2002 Presented by Matthias KasemannFNAL and CERN
Outline • The LCG Project: goal and organization • Common solutions: • Why common solutions • How to … • The Run2 common projects • The LCG Project: status of planning • Results of the LCG workshop in March 02 • Planning in the Applications Area For the LCG Grid see: Les Robertson (Thursday)“The LHC Computing Grid Project - Creating a Global Virtual Computing Center for Particle Physics” Matthias Kasemann, FNAL and CERN, June 25, 2002
2037 2446 1733 1699 4003 3611 952 1328 2132 1870 2093 3271 4732 1102 2491 3216 2421 1211 2319 2133 3451 1942 1121 3429 3742 1288 2343 7142 Raw data Convert to physics quantities From Raw Data to Physics:what happens during analysis 250Kb – 1 Mb 100 Kb 25 Kb 5 Kb 500 b e+ f Z0 _ f e- Interaction with detector material Pattern, recognition, Particle identification Detector response apply calibration, alignment, Fragmentation, Decay Physics analysis Basic physics Results Matthias Kasemann, FNAL and CERN, June 25, 2002 Analysis Reconstruction Simulation (Monte-Carlo)
Detector alignment Detector description Detector calibration Generate Events Reconstruction parameters Build Reconstruction Geometry Build Simulation Geometry Physics Reconstruct Events Reconstuction geometry Analyze Events Simulation geometry ATLAS Detector Simulate Events Raw Data ESD AOD HEP analysis chain: common to LHC experiments Matthias Kasemann, FNAL and CERN, June 25, 2002
Developing Software for LHC experiments • Challenges in big collaborations • Long and careful planning process • More formal procedure required to commit resources • Long lifetime, need flexible solutions which allow for change • Any state of experiment longer than typical Ph.D. or postdoc time • Need for professional IT participation and support • New development, maintenance and support model required • Challenges in smaller collaborations • Limited in resources • Adapt and implement available solutions (“b-b-s”) Matthias Kasemann, FNAL and CERN, June 25, 2002
LHC starts CMS - CCS schedule (V33): the bottom line Milestones of ~ next year: delays of ~9 months Milestones a few yrs away: delays of ~15 months Matthias Kasemann, FNAL and CERN, June 25, 2002
CMS - CCS Software Baseline: L2 milestones • DDD ready for OSCAR, ORCA, IGUANA • Data model defined • Persistent and transient representations • Demonstrably as correct as existing CMS description • Switch from Geant3 to Geant4: • Date not decided (just my estimate) • E.g. it needs the new persistency Matthias Kasemann, FNAL and CERN, June 25, 2002 CCS Baseline Software for TDR’s Software Infrastructure deployed and working • User analysis components • Framework with coherent user interface • Event display / interactive visualisation • Tools for browsing / manipulating data sets • Data presentation, histograms, numerical,… • Framework for processing CMS data • Working for simulation, reconstruction, analysis • Supporting persistency and data management • Strongly dependent on LCG success
Work Areas Applications Support & Coordination Computing Systems Grid Technology Grid Deployment Common Solutins Experiments and Regional Centres agree on requirements for common projects Work Areas Applications Support & Coordination Computing Systems Grid Technology Grid Deployment Project Overview Board Software andComputingCommittee(SC2) ProjectExecutionBoard Work Plan Definition RTAG WP WP WP WP WP The LHC Computing Grid Project (LCG) • LCG was approved in fall 2001 • resources contributed from some member states • 1. Workshop in March 02 Matthias Kasemann, FNAL and CERN, June 25, 2002
LCG - Fundamental goal: The experiments have to get the best, most reliable and accurate physics results from the data provided by their detectors Their computing projects are fundamental to the achievement of this goal The LCG project at CERN was set up to help them all in this task Corollary Success of LCG is fundamental to success of LHC Computing Matthias Kasemann, FNAL and CERN, June 25, 2002
Fulfilling LCG Project Goals • Prepare and deploy the LHC Computing Environment • Applications - provide the common components, tools and infrastructure for the physics application software • Computing system – fabric, grid, global analysis system • Deployment – foster collaboration and coherence • Not just another grid technology project • Validate the software by participating in Data Challenges using the progressively more complex Grid Prototype • Phase 1 - 50% model production grid in 2004 • Produce a TDR for full system to be built in Phase 2 • Software performance impacts on size and cost of production facility • Analysis models impact on exploitation of production grid • Maintain opportunities for reuse of deliverables outside LHC experimental programme Matthias Kasemann, FNAL and CERN, June 25, 2002
Applications Activity Areas • Application software infrastructure • physics software development environment, standard libraries, development tools • Common frameworks for simulation and analysis • Development and integration of toolkits & components • Support for physics applications • Development, support of common software tools & frameworks • Adaptation of Physics Applications to Grid environment • Object persistency and data management tools • Event data, metadata, conditions data, analysis objects, Matthias Kasemann, FNAL and CERN, June 25, 2002
Goals for Applications Area • Many Software Production Teams • LHC experiments • CERN IT groups, ROOT team, .. • HEP software collaborations – CLHEP, Geant4 , .. • External Software – python, Qt, XML, … • Strive to work together to develop and use software in common • Will involve identifying and packaging existing HEP software for reuse as well as developing new components • Each unit has its own approach to design and to supporting the development • Sharing in the development and deployment of software will be greatly facilitated if units follow a common approach • Recognise that there will be start-up costs associated with adapting to use new common products and development tools Matthias Kasemann, FNAL and CERN, June 25, 2002
Why common and when? • Why not: • Experiments have independent detectors and analysis tools verify physics results • Competition for best physics results • Coordination of common software development is significant overhead • Why common solutions: • Need mature engineered software • Resources are scarce, in particular manpower • Effort: Common projects are a good way to become more efficient ( , , , ?) • Lessons need to be learnt from past experience • For LHC experiments: Everything non experiment–specific is a potential candidate for a common project Matthias Kasemann, FNAL and CERN, June 25, 2002
FNAL: CDF/D0/CD - Run 2 Joint Project Organization Directorate D0 Collaboration CDF Collaboration External Review Committee R2JOP Steering Committee Task Coordinators Run II Committee Run II Computing Project Office Mass Storage & Data Access Reconstruction Systems Matthias Kasemann, FNAL and CERN, June 25, 2002 Physics Analysis Support Basic Infrastructure Reconstruction farm hardware Networking hardware Serial Media Working Group Storage Management Physics analysis hardware Fermilab Class Library Simulation Reconstruction input pipeline Production Management Visualization Configuration Management MSS Hardware Support Databases Data Access Physics Anal-ysis Software 15 joint projects defined, 4 years before start of data taking
Perceptions of Common Projects • Experiments • Whilst may be very enthusiastic about long-term advantages …. • …have to deliver on short term milestones • Devoting resources to both will be difficult • Already experience an out-flux of effort into common projects • Hosting projects in experiments excellent way of integrating effort • For initial phase and prototyping • Technology groups • Great motivation to use expertise to produce useful solutions • Need the involvement of the experiments Matthias Kasemann, FNAL and CERN, June 25, 2002
Common solutions - How to do? • Requirements are set by experiments in the SC2 + Requirements Technical Assessment Groups (RTAGs) • Planning and implementation is done by LCG together with experiments • Monitoring of progress and adherence by the SC2 • Frequent releases and testing • Guaranteed life-time maintenance and support Issues: • ‘How will applications area cooperate with other areas?’ • ‘Not feasible to have a single LCG architect to cover all areas.’ • Need mechanisms to bring coherence to the project Matthias Kasemann, FNAL and CERN, June 25, 2002
Workflow around the organisation chart WPn PEB SC2 RTAGm SC2 Sets the requirements mandate PEB develops workplan Prioritised requirements requirements ~2 mths Updated workplan Project plan Workplan feedback Matthias Kasemann, FNAL and CERN, June 25, 2002 SC2 approves the workplan PEB manages LCG resources Release 1 Status report time ~4 mths Review feedback PEB tracks progress SC2 reviews the status Release 2
Issues related to partitioning the work • ‘How do you go from present to future without dismantling existing projects?’ • ‘Have to be careful that we don’t partition into too small chunks and lose coherence of overall software’ • We are not starting afresh, we have a good knowledge of what the broad categories are going to be • Experiment architectures help to ensure coherency. Matthias Kasemann, FNAL and CERN, June 25, 2002
Coherent Architecture • Applications common projects must follow a coherent overall architecture • The software needs to be broken down into manageable pieces i.e. down to the component level • Component-based, but not a bag of disjoint components • components designed for interoperability through clean interfaces • Does not preclude a common implementation foundation, such as ROOT, for different components • The ‘contract’ in the architecture is to respect the interfaces • No hidden communication among components • Starting point is existing products, not a clean slate Matthias Kasemann, FNAL and CERN, June 25, 2002
Approach to making workplan • “Develop a global workplan from which the RTAGs can be derived” • Considerations for the workplan: • Experiment need and priority • Is it suitable for a common project • Is it a key component of the architecture e.g. object dictionary • Timing: when will the conditions be right to initiate a common project • Do established solutions exist in the experiments • Are they open to review or are they entrenched • Availability of resources and allocation of effort • Is there existing effort which would be better spent doing something else • Availability, maturity of associated third party software • E.g. grid software • Pragmatism and seizing opportunity. A workplan derived from a grand design does not fit the reality of this project Matthias Kasemann, FNAL and CERN, June 25, 2002
RTAG: ‘blueprint’ of LCG application architecture • Mandate: define the architectural ‘blueprint’ for LCG applications: • Define the main architectural domains (‘collaborating frameworks’) of LHC experiments and identify their principal components. (For example: Simulation is such an architectural domain; Detector Description is a component which figures in several domains.) • Define the architectural relationships between these ‘frameworks’ and components, including Grid aspects, identify the main requirements for their inter-communication, and suggest possible first implementations. (The focus here is on the architecture of how major ‘domains’ fit together, and not detailed architecture within a domain.) • Identify the high-level milestones for each domain and provide a first estimate of the effort needed. (Here the architecture within a domain could be considered.) • Derive a set of requirements for the LCG • Time-scale: started in June 02, draft report in July, final report in August 02 Matthias Kasemann, FNAL and CERN, June 25, 2002
RTAG status • Identified and started eight Requirement Technical Assessments (RTAGs) • in application software area • Data persistency finished • Software support process and tools finished • Mathematical libraries finished • Detector Geometry & Materials descriptions started • ‘blueprint’ architecture of applications started • Monte Carlo event generators started • in compute fabric area • mass storage requirements finished • in Grid technology and deployment area • Grid technology use cases finished • regional center category and services definition finished Matthias Kasemann, FNAL and CERN, June 25, 2002
Software Process RTAG • Mandate: • Define a process for managing LCG software. Specific tasks to include: Establish a structure for organizing software, for managing versions and coherent subsets for distribution • Identify external software packages to be supported • Identify recommended tools for use within the project – to include configuration and release management • Estimate resources (person power) needed to run an LCG support activity • Guidance: • Procedures and tools will be specified • Will be used within project • Can be packaged and supported for general use • Will evolve with time • The RTAG does not make any recommendations on how experiment internal software should be developed and managed. However, if an experiment specific program becomes an LCG product it should adhere to the development practices proposed by this RTAG Matthias Kasemann, FNAL and CERN, June 25, 2002
Process RTAG – Recommendations(1) • All LCG projects must adopt the same set of tools, standards and procedures. The tools must be centrally installed, maintained and supported. • Adopt commonly used open-source or commercial software where available. Try to avoid “do it yourself” solutions in this area where we don’t have core competency. • Concerning commercial software, avoid commercial software that has to be installed on individual’s machines as this will cause well known problems of license agreements and management in our widely distributed environment. Commercial solutions for web-portals or other centrally managed solutions would be fine. Matthias Kasemann, FNAL and CERN, June 25, 2002
Process RTAG – Recommendations(2) • ‘Release early, release often’ implies • major release 2-3 times per year • Development release every 2-3 weeks • Automated nightly builds, regression tests, benchmarks • Test and quality assurance • Support of external software • installation and build up of local expertise • Effort needed for filling support roles • Librarian • Release manager • Toolsmith • Quality assurance • Technical writer Matthias Kasemann, FNAL and CERN, June 25, 2002
Persistency RTAG • Mandate: • Write the product specification for the Persistency Framework for Physics Applications at LHC • Construct a component breakdown for the management of all types of LHC data • Identify the responsibilities of Experiment Frameworks, existing products (such as ROOT) and as yet to be developed products • Develop requirements/use cases to specify (at least) the metadata /navigation component(s) • Estimate resources (manpower) needed to prototype missing components • Guidance: • The RTAG may decide to address all types of data, or may decide to postpone some topics for other RTAGS, once the components have been identified. • The RTAG should develop a detailed description at least for the event data management. • Issues of schema evolution, dictionary construction and storage, object and data models should be addressed. Matthias Kasemann, FNAL and CERN, June 25, 2002
Persistency – Near term recommendations • to develop a common object streaming layer and associated persistence infrastructure. • a common object streaming layer based on ROOT-IO and several related components to support it, • including a (currently lightweight) relational database layer. • Dictionary services are included in the near-term project specification. • dictionary services may have additional clients • This is first step towards a complete data management environment, one with enormous potential for commonality among the experiments. Matthias Kasemann, FNAL and CERN, June 25, 2002
RTAG: math library review • Mandate: Review the current situation with math libraries and make recommendations • Review the current situation of the usage of the various math libraries in the experiments (including but not limited to NagC++, GSL, CLHEP, ROOT) • Identify and recommend which ones should be adopted, which ones could be discontinued • Suggest possible improvements to the existing ones • Estimate resources needed for this activity • Guidance – The result of the RTAG should allow to establish a clear program of work to streamline the status of math libraries and find the maximum commonality between experiments, taking into account cost, maintenance and projected evolution of the experiment needs Matthias Kasemann, FNAL and CERN, June 25, 2002
Math Library: Recommendations • To design a support group • to provide advice and information about the use of existing libraries, • to assure their continued availability, • to identify where new functionality is needed, and • to develop that functionality themselves or by coordinating with other HEP-specific library developers. • The goal would be to have close contact with the experiments and provide expertise on mathematical methods, aiming at common solutions, • The experiments should maintain a data base of mathematical libraries used in their software, and within each library, the individual modules used. • A detailed study should be undertaken to determine whether there is any functionality needed by the experiments and available in the NAG library which is not covered as well by a free library such as GSL. Matthias Kasemann, FNAL and CERN, June 25, 2002
RTAG: Detector Geometry & Materials Description • Write the product specification for detector geometry and materials description services. • Specify scope: e.g. Services to define, provide transient access to, and store the geometry and materials descriptions required by simulation, reconstruction, analysis, online and event display applications, with the various descriptions using the same information source • Identify requirements including end-user needs such as ease and naturalness of use of the description tools, readability and robustness against errors e.g. provision for named constants and derived quantities • Explore commonality of persistence requirements with conditions data management • Interaction of the DD with a conditions DB. In that context versioning and ‘configuration management’ of the detector description, coherence issues… • Identify where experiments have differing requirements and examine how to address them within common tools • Address migration from current tools Matthias Kasemann, FNAL and CERN, June 25, 2002
RTAG:Monte Carlo Event Generators • Mandate: To best explore the common solutions needed and how to engage the HEP community external to the LCG it is proposed to study: • How to maintain a common code repository for the generator code and related tools such as PDFLIB. • The development or adaptation of generator-related tools (e.g.HepMC) for LHC needs. • How to provide support for the tuning, evaluation and maintenance of the generators. • The integration of the Monte Carlo generators into the experimental software frameworks. • The structure of possible forums to facilitate interaction with the distributed external groups who provide the Monte Carlo generators. Matthias Kasemann, FNAL and CERN, June 25, 2002
Possible Organisation of activities Overall management, coordination, architecture, integration, support Architect Activity area Activity area Activity area Project Project Project leader Project Project Matthias Kasemann, FNAL and CERN, June 25, 2002 WP WP WP WP WP WP WP WP WP Example: Activity area: Physics data management Possible projects: Hybrid event store, Conditions DB, … Work Packages: Component breakdown and work plan lead to Work Package definitions. ~1-3 FTEs per WP
Global Workplan – 1st priority level • Establish process and infrastructure • Nicely covered by software process RTAG • Address core areas essential to building a coherent architecture • Object dictionary – essential piece • Persistency - strategic • Interactive frameworks - also driven by assigning personnel optimally • Address priority common project opportunities • Driven by a combination of experiment need, appropriateness to common project, and ‘the right moment’ (existing but not entrenched solutions in some experiments) • Detector description and geometry model • Driven by need and available manpower • Simulation tools Matthias Kasemann, FNAL and CERN, June 25, 2002
Global Workplan – 2nd priority level • Build outward from the core top-priority components • Conditions database • Statistical analysis • Framework services, class libraries • Address common project areas of less immediate priority • Math libraries • Physics packages (scope?) • Extend and elaborate the support infrastructure • Software testing and distribution Matthias Kasemann, FNAL and CERN, June 25, 2002
Global Workplan – 3rd priority level • The core components have been addressed, architecture and component breakdown laid out, work begun. Grid products have had another year to develop and mature. Now explicitly address physics applications integration into the grid applications layer. • Distributed production systems. End-to-end grid application/framework for production. • Distributed analysis interfaces. Grid-aware analysis environment and grid-enabled tools. • Some common software components are now available. Build on them. • Lightweight persistency, based on persistency framework • Release LCG benchmarking suite Matthias Kasemann, FNAL and CERN, June 25, 2002
Global Workplan – 4th priority level • Longer term items waiting for their moment • ‘Hard’ ones, perhaps made easier by a growing common software architecture • Event processing framework • Address evolution of how we write software • OO language usage • Longer term needs; capabilities emerging from R&D (more speculative) • Advanced grid tools, online notebooks, … Matthias Kasemann, FNAL and CERN, June 25, 2002
Candidate RTAGs (1) Matthias Kasemann, FNAL and CERN, June 25, 2002
Candidate RTAGs (2) Matthias Kasemann, FNAL and CERN, June 25, 2002
Common Solutions: Conclusions • Common Solutions for LHC software are required for success • Common solutions are agreed upon by experiments • The requirements are set by the experiments • The development is done jointly by the LCG project and the LHC experiments • All LCG software is centrally supported and maintained. • What makes us believe that we succeed? What is key to success? • The process in the LCG organization • The collaboration between players • Common technology • Central resources, jointly steer-able by experiments and management • Participants have prototyping experience !! Matthias Kasemann, FNAL and CERN, June 25, 2002
Post-RTAG Participation of Architects – Draft Proposal (1) • Monthly open meeting (expanded weekly meeting) • Accumulated issues to be taken up with architects • Architects in attendance; coordinators invited • Information has gone out beforehand, so architects are ‘primed’ • Meeting is informational, and decision-making (for the easier decisions) • An issue is either • Resolved (the easy ones) • Flagged for addressing in the ‘architects committee’ Matthias Kasemann, FNAL and CERN, June 25, 2002
Post-RTAG Participation of Architects – Draft Proposal (2) • Architects committee: • Members: experiment architects + applications manager (chair) • Invited: computing coordinators, LCG project manager and CTO • Others invited at discretion of members • e.g. project leader of project at issue • Meets shortly after the open meeting (also bi-weekly?) • Decides the difficult issues • Most of the time, committee will converge on a decision • If not, try harder • If still not, applications manager takes decision • Such decisions can be accepted or challenged • Challenged decisions go to full PEB, then if necessary to SC2 • PEB role of raising issues to be taken up by SC2 • We all abide happily by an SC2 decision • Committee meetings also cover general current issues and exchange of views • Committee decisions, actions documented in public minutes Matthias Kasemann, FNAL and CERN, June 25, 2002
Distributed Character of Components (1) • Persistency framework • Naming based on logical filenames • Replica catalog and management • Cost estimators; policy modules • Conditions database • Inherently distributed (but configurable for local use) • Interactive frameworks • Grid-aware environment; ‘transparent’ access to grid-enabled tools and services • Statistical analysis, visualization • Integral parts of distributed analysis environment • Framework services • Grid-aware message and error reporting, error handling, grid-related framework services Matthias Kasemann, FNAL and CERN, June 25, 2002
Distributed Character of Components (2) • Event processing framework • Cf. framework services, persistency framework, interactive frameworks • Distributed analysis • Distributed production • Software distribution • Should use the grid • OO language usage • Distributed computing considerations • Online notebook • Grid-aware tool Matthias Kasemann, FNAL and CERN, June 25, 2002
RTAG?: Simulation tools • Geant4 is establishing a HEP physics requirements body within the collaboration, accepted by SC2 as a mechanism for addressing G4 physics performance issues • However, there are important simulation needs to which LCG resources could be applied in the near term. • By the design of LCG, this requires SC2 delivering requirements to PEB • John Apostolakis has recently assembled G4 requests and requirements from the LHC collaborations • Proposal: Use these requirements as the groundwork for a quick 1-month RTAG to guide near term simulation activity in the project, leaving the addressing of physics performance requirements to the separate process within Geant4 Matthias Kasemann, FNAL and CERN, June 25, 2002
RTAG?: Simulation tools (2) • Some possible activity areas in simulation, from the Geant4 requests/requirements received from the experiments, which would be input to the RTAG: • Error propagation tool for reconstruction (‘GEANE’) • Assembly and documentation of standard physics lists • Python interface • Documentation, tutorials, communication • Geant4 CVS server access issues • The RTAG could also address FLUKA support • Requested by ALICE as an immediate priority • Strong interest expressed by other experiments as well Matthias Kasemann, FNAL and CERN, June 25, 2002
RTAG?: Detector geometry & materials description and modeling services • Write the product specification for detector geometry and materials description and modeling services • Specify scope: eg. Services to define, provide transient access to, and store the geometry and materials descriptions required by simulation, reconstruction, analysis, online and event display applications, with the various descriptions using the same information source • Identify requirements including end-user needs such as ease and naturalness of use of the description tools, readibility and robustness against errors e.g. provision for named constants and derived quantities • Explore commonality of persistence requirements with conditions data management • Identify where experiments have differing requirements and examine how to address them within common tools • Address migration from current tools Matthias Kasemann, FNAL and CERN, June 25, 2002
RTAG?: Conditions database • Will depend on persistency RTAG outcome • Refine the requirements and product specification of a conditions database serving the needs of the LHC experiments, using the existing requirements and products as a reference point. Give due consideration to effective distributed/remote usage. • Identify the extent to which the persistency framework (hybrid store) can be directly used at the lower levels of a conditions database implementation. • Identify the component(s) and interfaces atop a common persistency foundation that complete the conditions database Matthias Kasemann, FNAL and CERN, June 25, 2002
RTAG?: Data dictionary service • Can the experiments converge on common data definition and dictionary tools in the near term? • Even if the answer is no, it should be possible to establish a standard dictionary service (generic API) by which common tools can interact, while leaving free to the experiments how their class models are defined and implemented • Develop a product specification for a generic high-level data dictionary service able to accommodate distinct data definition and dictionary tools and present a common, generic interface to the dictionary • Review the current data definition and dictionary approaches and seek to expand commonality among the experiments. Write the product specifications for common (even if N<4) components. Matthias Kasemann, FNAL and CERN, June 25, 2002
RTAG?: Interactive frameworks • Frameworks providing interactivity for various environments including physics analysis and event processing control (simulation and reconstruction) are critical. They serve end users directly and must match end user requirements extremely well. They can be a powerful and flexible ‘glue’ in a modular environment, providing interconnectivity between widely distinct components and making the ‘whole’ offered by such an environment much greater than the sum of its parts. • Develop the requirements for an interactive framework common across the various application environments • Relate the requirements to existing tools and approaches (e.g. ROOT/CINT, Python-based tools) • Write a product specification, with specific recommendations on tools and technologies to employ • Address both command line and GUI interactivity Matthias Kasemann, FNAL and CERN, June 25, 2002