230 likes | 347 Views
Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002. CMS distributed computing. CMS wanted to build a distributed computing system all along! CMS CTP (Dec 1996): One integrated computing system with a single global view of the data
E N D
Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002
CMS distributed computing • CMS wanted to build a distributed computing system all along! • CMS CTP (Dec 1996): • One integrated computing system with a single global view of the data • Used by the 1000s of CMS collaborators around the world • We now call this the `CMS Data Grid System'
PPDG: Mission-Oriented Pragmatic Methodology • End-to-end integration and deployment of experiment applications using existing and emerging Grid services • Deployment of Grid technologies and services in production (24x7) environments • With stressful performance needs • Collaborative development of Grid middleware and extensions between application and middleware groups • Leading to pragmatic and acceptable-risk solutions. • HENP experiments extend their adoption of common infrastructures to higher layers of their data analysis and processing applications. • Much attention to integration, coordination, interoperability and interworking • With emphasis on incremental deployment of increasingly functional working systems
CMS Grid Requirements 2003 CMS data grid system vision 28Pages • Major Grid requirements effort completed • Document writing by Caltech group • Catania CMS week Grid workshop (June 2001, about 12 hours over various sessions) • CMS consensus on many strategic issues • Division of labor between Grid projects and CMS Computing group • Needed for planning, manpower estimates • Grid job execution model • Grid data model, replication model • Object handling and the Grid • Main Grid Requirements Document: CMS Data Grid System Overview and Requirements. CMS Note 2001/037http://kholtman.home.cern.ch/kholtman/cmsreqs.pdf • Additional documents on object views, hardware sizes, workload model, data model (K. Holtman) CMS Note 2001/047
Objects and Files in the Grid • CMS computing is object-oriented, and database oriented • Fundamentally we have a persistent data model with 1 object = 1 piece of physics data (KB-MB size) • Much of the thinking in the Grid projects and Grid community is file oriented • `Computer center' view of large applications • Do not look inside application code • Think about application needs in terms of CPU batch queues, disk space for files, file staging and migration • How to reconcile this ? • CMS requirements 2001-2003: • Grid project components do not need to deal with objects directly • Specify file handling requirements in such a way that a CMS layer for object handling can be built on top • LCG Project (SC2, PEB) has started to develop new object handling layer
Provided by CMS • Mapping between objects and files (persistency layer) • Local and remote extraction and packaging of objects to/from files • Consistency of software configuration for each site • Configuration meta-data for each sample • Aggregation of sub-jobs • Policy for what we want to do (e.g. priorities for what to run first, the production manager) • Some error recovery too Not needed by 2003 • Auto-discovery of arbitrary identical/similar samples Needed from Somebody • Tool to implement common CMS configuration on remote sites ? Provided by the Grid • Distributed job scheduler: if a file is remote the Grid will run appropriate CMS software (often remotely; split over systems) • Resource management, monitoring, and accounting tools and services • Query estimation tools (to WHAT DEPTH?) • Resource optimisation with some user hints / control (coherent management of local copies, replication, caching) • Transfer of collections of data • Error recovery tools (from e.g. job/disk crashes.) • Location information of Grid-managed files • File management such as creation, deletion, purging, etc. • Remote virtual login and authentication / authorisation Grid Services for CMS: Division of Labor(CMS Week,June 2001)
Application = initial solution is operational Catalog Services Monitoring Planner MCAT; GriPhyN catalogs Info Services Repl. Mgmt. MDS Executor Policy/Security GSI, CAS Reliable Transfer Service Compute Resource Storage Resource Ian Foster, Carl Kesselman, Mike Wilde, others GriPhyN/PPDG Architecture
Common Prod. tools (IMPALA) Digitization Simulation GDMP No PU PU CERN Fully operational FNAL Moscow In progress INFN (10) Caltech UCSD UFL Imperial College Worldwide Productionat 21 Sites Bristol Wisconsin IN2P3 Helsinki CMS Production
OBJECTIVITY DATATOTAL = 29 TB TYPICAL EVENT SIZES Simulated • 1 CMSIM event= 1 OOHit event = 1.4 MB Reconstructed • 1 “1033” event = 1.2 MB • 1 “2x1033” event = 1.6 MB • 1 “1034” event = 5.6 MB Simulated EventsTOTAL = 8.4 M CERN 14 TB Caltech 2.50 M FNAL 12 TB FNAL 1.65 M Caltech 0.60 TB Bristol/RAL 1.27 M Moscow 0.45 TB CERN 1.10 M INFN 0.40 TB INFN 0.76 M Bristol/RAL 0.22 TB Moscow 0.43 M UCSD 0.20 TB IN2P3 0.31 M Helsinki 0.13 M IN2P3 0.10 TB Wisconsin 0.07 M Wisconsin 0.05 TB UCSD 0.06 M Helsinki - UFL 0.05 M UFL 0.08 TB Data Produced in 2001
Authors: Caltech, CERN/CMS, FNAL, CERN/IT; PPDG, GriPhyN, EU DataGrid WP2 Integration with ENSTORE; HPSS, Castor Tape Systems GDMP • Tool to transfer and manage files in production • Easy to handle this manually with a few centers, • impossible with lots of data at many centers • GDMP is based around Globus Middleware and a Flexible architecture • Globus Replica Catalogue • Provided an early model of collaboration between HEP and Grid middleware providers • Successfully used to replicate > 1TB of CMS data • Now a PPDG/EU DataGrid joint project • Grid Data Management Pilot (GDMP): A Tool for Wide Area Replication Applied Informatics Conference (AI2001), Innsbruck, Austria, 2/1001.
PPDG MOP system • PPDG Developed MOP System • Allows submission of CMS prod. Jobs from a central location, run on remote locations, and returnresults • Relies on GDMP for replication • Globus GRAM • Condor-G and local queuing systems for Job Scheduling • IMPALA for Job Specification • Shown in SC2001 demo • Now being deployed in USCMS testbed • Proposed as basis for next CMS-wide production infrastructure
US CMS Prototypes and Test-beds All U.S. CMS S&C Institutions are involved in DOE and NSF Grid Projects • Integrating Grid softwareinto CMS systems • Bringing CMS Productionon the Grid • Understanding the operational issues • MOP used as first pilot application • MOP system got official CMS production assignment of 200K CMSIM events • 50K have been produced and registered already
Installing middleware • Virtual Data Toolkit • Globus 2.0 beta • Essential Grid Tools • Essential Grid Services I & II • Grid API • Condor-G 6.3.1 • Condor 6.3.1 • ClassAds 0.9 • GDMP 3.0 alpha 3 • We found the VDT to be very easy to install, but a little bit more challenging to configure
Prototype VDG System (production) Planner Executor Compute Resource Storage Resource Local Tracking DB Concrete Planner/ WP1 Abstract Planner MOP/ WP1 CMKIN BOSS Wrapper Scripts CMSIM Local Grid Storage User ORCA/COBRA Materialized Data Catalog Replica Catalog GDMP Objectivity Metadata Catalog Virtual Data Catalog RefDB Catalog Services Replica Mngmt
Planner Executor Compute Resource Storage Resource Local Tracking DB Concrete Planner/ WP1 Abstract Planner MOP/ WP1 CMKIN BOSS Wrapper Scripts CMSIM Local Grid Storage User ORCA/COBRA Materialized Data Catalog Replica Catalog GDMP Objectivity Metadata Catalog Virtual Data Catalog RefDB Catalog Services Replica Mngmt Prototype VDG System (production) = no code = existing = implemented using MOP
Analysis part • Physics data analysis will be done by 100s of users • Caltech taking responsibility for developing the analysis part of the vertically integrated system • Analysis part is connected to same catalogs • Maintain a global view of all data • Big analysis jobs can use production job handling mechanisms • Analysis services based on tags
Optimization of “Tag” Databases • Tags are small (~0.2 - 1 kbyte) summary objects for each event • Crucial for fast selection of interesting event subsets;this will be an intensive activity • Past work concentrated in three main areas: • Integration of CERN’s “HepODBMS” generic Tag system with the CMS “COBRA[*]” framework • Investigations of Tag bitmap indexing to speed queries • Comparisons of OO and traditional databases (SQL Server, soon Oracle 9i) as efficient stores for Tags • New work concentrates on tag based analysis services
CLARENS: a Portal to the Grid • Grid-enabling the working environment for non-specialist physicists' data analysis • Clarens consists of a server communicating with various clients via the commodity XML-RPC protocol. This ensures implementation independence. • The server is implemented in C++ to give access to the CMS OO analysis toolkit. • The server will provide a remote API to Grid tools: • Security services provided by the Grid (GSI) • The Virtual Data Toolkit: Object collection access • Data movement between Tier centers using GSI-FTP • CMS analysis software (ORCA/COBRA), • Current prototype is running on the Caltech proto-Tier2 • More information at http://clarens.sourceforge.net, along with a web-based demo
Discovery Lookup Service Proxy Lookup Service Client (other service) Registration RC Monitor Service • Component Factory • GUI marshaling • Code Transport • RMI data access Farm Monitor Farm Monitor Globally Scalable Monitoring Service CMS (Caltech and Pakistan) Push & Pull rsh & ssh existing scripts snmp
Current events • GDMP and MOP just had very favorable internal reviews in PPDG • Testbed: currently MOP deployment under way • Stresses the Grid middleware in new ways: new issues and bugs being discovered in Globus, Condor • Testbed MOP production request: • 200K CMSIM events requested, now 50K (~10 GB) finished and validated. • New fully integrated system: first versions expected by summer • System will be the basis for demos at SC2002 • Upcoming: CMS workshop on Grid based production (CERN) • Upcoming: PPDG analysis workshop (Berkeley)
2000 - 2001 Main `Grid task' activities in 2000 - 2001: • Ramp-up of Grid projects, establish a new mode of working • Grid project requirements documents, architecture • GDMP • Started as griddified package for data transport in CMS production, is now a more generic project • Used widely in 2001 production • Also demo of mode of working • MOP • Vertical integration of CMS production software, GDMP, Condor • Both GDMP and MOP just had very succesful internal reviews in PPDG
2002 • Grid task main activities (in US) in 2002: • Build USCMS test grid • Deploy Globus 2.0, EU DataGrid components • Use MOP as a basis for developing a larger vertically integrated system with • Virtual data features • Central catalogs and a global view of data • Production facilities • Participate in real CMS production with non-trivial jobs • Analysis facilities • Caltech team's main role is towards analysis facilities
Summary: 2000 - 2002 • Main `Grid task' activities in 2000 - 2001: • Grid project requirements documents, architecture • GDMP • MOP • Main `Grid task' activities (in US) in 2002: • Build USCMS test grid • Deploy Globus 2.0, EU DataGrid components • Use MOP as a basis for developing a larger vertically integrated system with • Virtual data features • Central catalogs and a global view of data • Production facilities • Participate in real CMS production • Analysis facilities