LHC Computing&Grid(s)

LHC Computing&Grid(s) Mirco Mazzucato INFN-Padova M.Mazzucato – HEPCCC-Bologna

Main conclusion of the “LHC Comp. Review” • The Panel recommends the multi-tier hierarchical model proposed by Monarc as one key element of the LHC computing model with the majority of the resources not based at CERN : 1/3 in 2/3 out • About equal share between Tier0 at CERN, Tier1’s and lower level Tiers down to desktops • Tier0 / S(Tier1) / S (all Tier2 +… ) = 1 /1 /1 • All experiments should perform Data Challenges of increasing size and complexity until LHC start-up involving also Tier2 • EU Testbed : 30-50% of one LHC experiment by 2004 (Match well with INFN Grid assumptions : 10% of final size for each experiment) • Limit heterogeneity : OS = Linux , Persistency = 2 tools max • General consensus that GRID technologies developed by Datagrid can provide the way to efficiently realize this infrastructure M.Mazzucato – HEPCCC-Bologna

HEP Monarc Regional Centre Hierarchy CERN Tier 0 2.5Gbps UK Tier 1 France INFN 2.5Gbps Fermilab 2.5Gbps >=622Mbps Tier2 center Tier 2 622Mbps Tier 3 Site Site Site 100Mbps-1Gbps Tier 4 desktop INFN-GRID M.Mazzucato – HEPCCC-Bologna

NICE PICTURE ….BUT WHAT DOES IT MEANS ? M.Mazzucato – HEPCCC-Bologna

SOME HISTORY…(S.Cittolin) M.Mazzucato – HEPCCC-Bologna

The Present and the Future… • Linux Farms are now available in many Universities and Research Centers. Lot of experience on them. • The basic hardware is constituted by commodity components which are easy to find, manage and that can provide very large computing and storage capacity in a very limited space. • Technology Tracking tell us that this trend will continue: • Now:40 CPU or 200 disks in a standard rack 60x80 cm2 ;power will continue to grow according to Moore law (x2 every 14-18 months) • Mass storage is not a commodity yet… but will we need it? • New Fluorescent Multi-layer Disk (FMD) by Constellation 3D: http://www.c-3d.net/ ; In future:1Terabyte/disk, 1Gbyte/sec I/O • WAN is now a commodity thanks to opening of the markets: • Moore law : x2-4 increase/year at equal cost • 10 Gbps available now in WAN • Hardware and connectivity will not be a problem! M.Mazzucato – HEPCCC-Bologna

The real Challenge: the software • How to put together all these WAN distributed resources in a “transparent” way for the users • “transparent” means that user should not note the presence of “network and many WAN distributed sources of resources” • As the WEB with good network connectivity • How to groupe them dynamically to satisfy virtual organizations tasks? • Here comes the Grid paradigm • End of ’99 for EU and LHC Computing • GRIDS:Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals—in the absence of central control, omniscience, trust relationships (Ian Foster @ Carl Kesselmann – CERN January 2001) • Just in time to answer the question opened by the Monarc model. M.Mazzucato – HEPCCC-Bologna

The GRID provides the glue When the network is as fast as the computer’s internal links, the machine disintegrates across the net into a set of special purpose appliances (Ian Foster) M.Mazzucato – HEPCCC-Bologna

Application Internet Protocol Architecture “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services Collective “Sharing single resources”: negotiating access, controlling use Resource “Talking to things”: communication (Internet protocols) & security Connectivity Transport Internet “Controlling things locally”: Access to, & control of, resources Fabric Link • The Anatomy of the Grid: Enabling Scalable Virtual Organizations, • I. Foster, C. Kesselman, S. Tuecke, Intl J. Supercomputer Applns, 2001. www.globus.org/research/papers/anatomy.pdf • The Globus Team:Layered Grid Architecture Application M.Mazzucato – HEPCCC-Bologna

Common Development Directions for all projects • Adopt Globus concept of layered architecture • Adopt Globus basic services • Core Data Grid services: transport (GridFTP), replica management, high-perf protocols • Resource management (GRAM), information services (MDS) • Security and policy for collaborative groups (PKI) • DataGrid: Midlleware development • New layer on top of Globus basic services • Motivated and steered by the needs of LHC experiments • GriPhyN: virtual data R&D • Motivated by requirements of experiments • Virtualization wrt location and materialization • Research orientation, will feed into toolkits • PPDG : tools for HEP applications M.Mazzucato – HEPCCC-Bologna

Where are we ? Software and Middleware • Concluded evaluation phase. Basic Grid services (Globus and Condor) are in installed in several testbeds: INFN, France, UK, US… • Need in general more robustness, reliability and scalability (HEP has hundreds of users, hundreds of jobs, enormous data sets…) But DataGrid and US Testbeds 0 are up and running • Solved problems of multiple CA… • Real experiments applications start to use the available GRID software (ALICE, ATLAS, CMS, LHC-B, but also BABAR, D0, Virgo/LIGO ….) Started close coordination between EU and US: DataGrid, Globus, GriPhyN, PPDG…INFN-Grid, GridPP • EU-US Coordination Framework Global Grid Forum: >300 people, >90 orgs, 11 countries First delivery of DataGrid middleware due for September 2001 M.Mazzucato – HEPCCC-Bologna

An example of delivery of DataGrid for PM9 (September 2001) Workload management system (1st prototype) Other info Resource Discovery Submit jobs (using JDL [Class-Ads]) Broker GIS + Replica Catalog Broker chooses in which Globus resources the jobs must be submitted Job submission service Information on characteristics and status of local resources Condor-G Condor-G able to provide a reliable/crash-proof job submission service Globus GRAM as uniform interface to different local resource management systems Globus GRAM Globus GRAM Globus GRAM Local Resource Management Systems CONDOR LSF PBS Site1 Farms Site2 Site3 M.Mazzucato – HEPCCC-Bologna

WP1 current activities • Focus on the implementation of the first prototype workload management system (1st release) • Fast prototyping instead of a classic top-down approach • Development of new software and integration of existing components (Globus, Condor, …) • Grid accounting • Economy-based model ? • Long term activity (not for the PM9 release) M.Mazzucato – HEPCCC-Bologna

Functionalities foreseen for the WP1 1st prototype • First version of job description language (JDL) • Used when the job is submitted, to specify the job characteristics and (required and preferable) resources • First version of resource broker • Responsible to choose the “best” computing resources where to submit jobs • Required and preferred resources “matched” with the characteristics and status of the available Grid resources • Job submission service • Responsible to submit jobs to computing resources M.Mazzucato – HEPCCC-Bologna

Functionalities foreseen for the WP1 1st prototype • First version of bookkeeping service • Short term (volatile) data about currently active jobs • First version of logging service • Long term (persistent) info about jobs and workload management system • First user interface • Command-line, for job management operations M.Mazzucato – HEPCCC-Bologna

WP2 Status • Excellent and productive core team and 2 of 3 external partners staffed • Good progress on WP2 design and architecture: June deliverable on time • Good exchanges with ATF and other middleware WP’s • Good exchanges with Globus and PPDG in USA M.Mazzucato – HEPCCC-Bologna

Month 9 deliverable components • Replication package GDMP (with Globus Replica Catalog) to support Objectivity, ROOT and plain files • GSI security enhanced Castor service • SQL based information services (alpha/beta prototypes only) M.Mazzucato – HEPCCC-Bologna

WP4:Development of release 1 deliverables • Prototype of fabric configuration management system. The configuration management system consists of two parts: • High level definition of configuration information providing the means to express inheritance and other types of dependencies • Low level definition of configuration information, which the node uses to configure itself. The release 1 prototype will provide the low level definition part and the necessary caching mechanism to assure that the node data is up-to-date. • Interim Installation System (IIS) which will provide the means to distribute and install software packages on compute cluster nodes. • The IIS is based on a tool called LCFG from the Edinburgh university. • It will use the configuration management prototype for storing all software configuration information M.Mazzucato – HEPCCC-Bologna

WP4 Other activities • The WP4 tools survey was finished and delivered at its due date (30/4/01). It has been reviewed and accepted by the technical board for delivery to the EU. • Development of the WP4 architecture. WP4 aims to deliver a complete enterprise-like solution for automatic management of entire fabrics. The basic components are • Central configuration management • Automatic software installation and maintenance • System monitoring and problem management • Resource management (local to fabric) • Grid integration The functionalities for each of those components have now been defined together with the top-down view coupling the components together. M.Mazzucato – HEPCCC-Bologna

WP5 • Progress has been delayed through difficulties in recruiting staff so the first deliverable, a report on current technology, is late. Partners have also had difficulty in obtaining the right unfunded effort but this is improving • Following long ATF discussions, the StorageElement as the DataGrid interface for physical file access and management has been defined. • WP5 has a plan for its implementation using • GridFTP for data access and • a diskpool manager for allocation, pinning, garbage collection etc M.Mazzucato – HEPCCC-Bologna

WP8 : “Gridifying the LHC” • Experiments already provide installation kits for testbeds • ongoing tests on testbed0 • trying to match the Grid to their data model (and vice versa) • it is planned to use Grid tools for production soon M.Mazzucato – HEPCCC-Bologna

WP8 Deliverables • Definition of short-term use cases • fast deployment for TESTBED0 • Alice, ATLAS, CMS, LHCb • each at at least two sites • suitable with minor changes for TESTBED1 M.Mazzucato – HEPCCC-Bologna

WP8: Deliverables II • Long-term use cases • description of the LHC computing for experiments • user requirements derived from here • work is ongoing • ONE unified set of requirements M.Mazzucato – HEPCCC-Bologna

Data Model (simplified) 1 Data model in terms of virtual data = on demand reconstruction • Data processing chain, but under continuous development • Some data products are uploaded M.Mazzucato – HEPCCC-Bologna

Data Model (simplified) 2 . . . • Algorithms A and B are registered,these define new virtual data products M.Mazzucato – HEPCCC-Bologna

Data Model (simplified) 3 . . . • Job X can now be run to analyze some virtual data products M.Mazzucato – HEPCCC-Bologna

Data Model (simplified) 4 . . . • On the basis of the job X output,a new algorithm G is defined and registered,this creates new virtual data products M.Mazzucato – HEPCCC-Bologna

Conclusions • A large progress is being done so far by the present EU-US Grid projects • Experiments are more and more involved in defining their requirements and contribute to the definition of the final architecture Main Challenges to be closely followed • Guarantee interoperabilty of Grid middleware • Compatible architecture • Grid EU-US coordination Framework • Transatlantic testbeds • Guarantee environment for Open Source development • Introduce Grid paradigm in basic core experiment software architecture • Need more and more collaboration between Grid middleware developers and core software experiment team • The new LHC Computing Grid Project need to incorporate the enormous prograss done so far by DataGrid, GriPhyN, PPDG… M.Mazzucato – HEPCCC-Bologna

LHC Computing&Grid(s)