Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

CMS distributed computing • CMS wanted to build a distributed computing system all along! • CMS CTP (Dec 1996): • One integrated computing system with a single global view of the data • Used by the 1000s of CMS collaborators around the world • We now call this the `CMS Data Grid System'

PPDG: Mission-Oriented Pragmatic Methodology • End-to-end integration and deployment of experiment applications using existing and emerging Grid services • Deployment of Grid technologies and services in production (24x7) environments • With stressful performance needs • Collaborative development of Grid middleware and extensions between application and middleware groups • Leading to pragmatic and acceptable-risk solutions. • HENP experiments extend their adoption of common infrastructures to higher layers of their data analysis and processing applications. • Much attention to integration, coordination, interoperability and interworking • With emphasis on incremental deployment of increasingly functional working systems

CMS Grid Requirements 2003 CMS data grid system vision 28Pages • Major Grid requirements effort completed • Document writing by Caltech group • Catania CMS week Grid workshop (June 2001, about 12 hours over various sessions) • CMS consensus on many strategic issues • Division of labor between Grid projects and CMS Computing group • Needed for planning, manpower estimates • Grid job execution model • Grid data model, replication model • Object handling and the Grid • Main Grid Requirements Document: CMS Data Grid System Overview and Requirements. CMS Note 2001/037http://kholtman.home.cern.ch/kholtman/cmsreqs.pdf • Additional documents on object views, hardware sizes, workload model, data model (K. Holtman) CMS Note 2001/047

Objects and Files in the Grid • CMS computing is object-oriented, and database oriented • Fundamentally we have a persistent data model with 1 object = 1 piece of physics data (KB-MB size) • Much of the thinking in the Grid projects and Grid community is file oriented • `Computer center' view of large applications • Do not look inside application code • Think about application needs in terms of CPU batch queues, disk space for files, file staging and migration • How to reconcile this ? • CMS requirements 2001-2003: • Grid project components do not need to deal with objects directly • Specify file handling requirements in such a way that a CMS layer for object handling can be built on top • LCG Project (SC2, PEB) has started to develop new object handling layer

Provided by CMS • Mapping between objects and files (persistency layer) • Local and remote extraction and packaging of objects to/from files • Consistency of software configuration for each site • Configuration meta-data for each sample • Aggregation of sub-jobs • Policy for what we want to do (e.g. priorities for what to run first, the production manager) • Some error recovery too Not needed by 2003 • Auto-discovery of arbitrary identical/similar samples Needed from Somebody • Tool to implement common CMS configuration on remote sites ? Provided by the Grid • Distributed job scheduler: if a file is remote the Grid will run appropriate CMS software (often remotely; split over systems) • Resource management, monitoring, and accounting tools and services • Query estimation tools (to WHAT DEPTH?) • Resource optimisation with some user hints / control (coherent management of local copies, replication, caching) • Transfer of collections of data • Error recovery tools (from e.g. job/disk crashes.) • Location information of Grid-managed files • File management such as creation, deletion, purging, etc. • Remote virtual login and authentication / authorisation Grid Services for CMS: Division of Labor(CMS Week,June 2001)

Application = initial solution is operational Catalog Services Monitoring Planner MCAT; GriPhyN catalogs Info Services Repl. Mgmt. MDS Executor Policy/Security GSI, CAS Reliable Transfer Service Compute Resource Storage Resource Ian Foster, Carl Kesselman, Mike Wilde, others GriPhyN/PPDG Architecture

Common Prod. tools (IMPALA) Digitization Simulation GDMP No PU PU CERN   Fully operational FNAL   Moscow  In progress  INFN (10)  Caltech   UCSD   UFL   Imperial College   Worldwide Productionat 21 Sites Bristol   Wisconsin   IN2P3   Helsinki   CMS Production

OBJECTIVITY DATATOTAL = 29 TB TYPICAL EVENT SIZES Simulated • 1 CMSIM event= 1 OOHit event = 1.4 MB Reconstructed • 1 “1033” event = 1.2 MB • 1 “2x1033” event = 1.6 MB • 1 “1034” event = 5.6 MB Simulated EventsTOTAL = 8.4 M CERN 14 TB Caltech 2.50 M FNAL 12 TB FNAL 1.65 M Caltech 0.60 TB Bristol/RAL 1.27 M Moscow 0.45 TB CERN 1.10 M INFN 0.40 TB INFN 0.76 M Bristol/RAL 0.22 TB Moscow 0.43 M UCSD 0.20 TB IN2P3 0.31 M Helsinki 0.13 M IN2P3 0.10 TB Wisconsin 0.07 M Wisconsin 0.05 TB UCSD 0.06 M Helsinki - UFL 0.05 M UFL 0.08 TB Data Produced in 2001

Authors: Caltech, CERN/CMS, FNAL, CERN/IT; PPDG, GriPhyN, EU DataGrid WP2 Integration with ENSTORE; HPSS, Castor Tape Systems GDMP • Tool to transfer and manage files in production • Easy to handle this manually with a few centers, • impossible with lots of data at many centers • GDMP is based around Globus Middleware and a Flexible architecture • Globus Replica Catalogue • Provided an early model of collaboration between HEP and Grid middleware providers • Successfully used to replicate > 1TB of CMS data • Now a PPDG/EU DataGrid joint project • Grid Data Management Pilot (GDMP): A Tool for Wide Area Replication Applied Informatics Conference (AI2001), Innsbruck, Austria, 2/1001.

PPDG MOP system • PPDG Developed MOP System • Allows submission of CMS prod. Jobs from a central location, run on remote locations, and returnresults • Relies on GDMP for replication • Globus GRAM • Condor-G and local queuing systems for Job Scheduling • IMPALA for Job Specification • Shown in SC2001 demo • Now being deployed in USCMS testbed • Proposed as basis for next CMS-wide production infrastructure

US CMS Prototypes and Test-beds All U.S. CMS S&C Institutions are involved in DOE and NSF Grid Projects • Integrating Grid softwareinto CMS systems • Bringing CMS Productionon the Grid • Understanding the operational issues • MOP used as first pilot application • MOP system got official CMS production assignment of 200K CMSIM events • 50K have been produced and registered already

Installing middleware • Virtual Data Toolkit • Globus 2.0 beta • Essential Grid Tools • Essential Grid Services I & II • Grid API • Condor-G 6.3.1 • Condor 6.3.1 • ClassAds 0.9 • GDMP 3.0 alpha 3 • We found the VDT to be very easy to install, but a little bit more challenging to configure

Prototype VDG System (production) Planner Executor Compute Resource Storage Resource Local Tracking DB Concrete Planner/ WP1 Abstract Planner MOP/ WP1 CMKIN BOSS Wrapper Scripts CMSIM Local Grid Storage User ORCA/COBRA Materialized Data Catalog Replica Catalog GDMP Objectivity Metadata Catalog Virtual Data Catalog RefDB Catalog Services Replica Mngmt

Planner Executor Compute Resource Storage Resource Local Tracking DB Concrete Planner/ WP1 Abstract Planner MOP/ WP1 CMKIN BOSS Wrapper Scripts CMSIM Local Grid Storage User ORCA/COBRA Materialized Data Catalog Replica Catalog GDMP Objectivity Metadata Catalog Virtual Data Catalog RefDB Catalog Services Replica Mngmt Prototype VDG System (production) = no code = existing = implemented using MOP

Analysis part • Physics data analysis will be done by 100s of users • Caltech taking responsibility for developing the analysis part of the vertically integrated system • Analysis part is connected to same catalogs • Maintain a global view of all data • Big analysis jobs can use production job handling mechanisms • Analysis services based on tags

Optimization of “Tag” Databases • Tags are small (~0.2 - 1 kbyte) summary objects for each event • Crucial for fast selection of interesting event subsets;this will be an intensive activity • Past work concentrated in three main areas: • Integration of CERN’s “HepODBMS” generic Tag system with the CMS “COBRA[*]” framework • Investigations of Tag bitmap indexing to speed queries • Comparisons of OO and traditional databases (SQL Server, soon Oracle 9i) as efficient stores for Tags • New work concentrates on tag based analysis services

CLARENS: a Portal to the Grid • Grid-enabling the working environment for non-specialist physicists' data analysis • Clarens consists of a server communicating with various clients via the commodity XML-RPC protocol. This ensures implementation independence. • The server is implemented in C++ to give access to the CMS OO analysis toolkit. • The server will provide a remote API to Grid tools: • Security services provided by the Grid (GSI) • The Virtual Data Toolkit: Object collection access • Data movement between Tier centers using GSI-FTP • CMS analysis software (ORCA/COBRA), • Current prototype is running on the Caltech proto-Tier2 • More information at http://clarens.sourceforge.net, along with a web-based demo

Discovery Lookup Service Proxy Lookup Service Client (other service) Registration RC Monitor Service • Component Factory • GUI marshaling • Code Transport • RMI data access Farm Monitor Farm Monitor Globally Scalable Monitoring Service CMS (Caltech and Pakistan) Push & Pull rsh & ssh existing scripts snmp

Current events • GDMP and MOP just had very favorable internal reviews in PPDG • Testbed: currently MOP deployment under way • Stresses the Grid middleware in new ways: new issues and bugs being discovered in Globus, Condor • Testbed MOP production request: • 200K CMSIM events requested, now 50K (~10 GB) finished and validated. • New fully integrated system: first versions expected by summer • System will be the basis for demos at SC2002 • Upcoming: CMS workshop on Grid based production (CERN) • Upcoming: PPDG analysis workshop (Berkeley)

2000 - 2001 Main `Grid task' activities in 2000 - 2001: • Ramp-up of Grid projects, establish a new mode of working • Grid project requirements documents, architecture • GDMP • Started as griddified package for data transport in CMS production, is now a more generic project • Used widely in 2001 production • Also demo of mode of working • MOP • Vertical integration of CMS production software, GDMP, Condor • Both GDMP and MOP just had very succesful internal reviews in PPDG

2002 • Grid task main activities (in US) in 2002: • Build USCMS test grid • Deploy Globus 2.0, EU DataGrid components • Use MOP as a basis for developing a larger vertically integrated system with • Virtual data features • Central catalogs and a global view of data • Production facilities • Participate in real CMS production with non-trivial jobs • Analysis facilities • Caltech team's main role is towards analysis facilities

Summary: 2000 - 2002 • Main `Grid task' activities in 2000 - 2001: • Grid project requirements documents, architecture • GDMP • MOP • Main `Grid task' activities (in US) in 2002: • Build USCMS test grid • Deploy Globus 2.0, EU DataGrid components • Use MOP as a basis for developing a larger vertically integrated system with • Virtual data features • Central catalogs and a global view of data • Production facilities • Participate in real CMS production • Analysis facilities

Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Presentation Transcript

Self-Consistent Theory of Halo Mergers

10 th International Conference on Calorimeter in High Energy Physics Caltech, USA 25-29 March 2002

*Courtesy NASA/JPL-Caltech

California Institute of Technology

Summary

Helping TCP Work at Gbps

Massive UV-Selected Galaxies at z~2

Introduction to the Grid Roy Williams, Caltech

Physics 1A, Section 6

ProdAgent

High Resolution Observations of the CMB with the CBI Interferometer

Multi-Gbps TCP

LIGO and GriPhyN

TCP transfers over high latency/bandwidth networks Internet2 Member Meeting

Fused Silica Suspension Research at Caltech, Lately

GAE (Grid Analysis Environment) Overview of Caltech effort

LIGO Grid Applications Development Within GriPhyN

P3K Team Meeting #16

Caltech/USGS Southern California Seismic Network

More “normal” than Normal: Scaling distributions in complex systems