430 likes | 569 Views
U.S. Physics Data Grid Projects. Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ avery@phys.ufl.edu. International Workshop on HEP Data Grids Kyungpook National University, Daegu, Korea Nov. 8-9, 2002. “Trillium”: US Physics Data Grid Projects.
E N D
U.S. Physics Data Grid Projects Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ avery@phys.ufl.edu International Workshop on HEP Data Grids Kyungpook National University, Daegu, KoreaNov. 8-9, 2002 Paul Avery
“Trillium”: US Physics Data Grid Projects • Particle Physics Data Grid (PPDG) • Data Grid for HENP experiments • ATLAS, CMS, D0, BaBar, STAR, JLAB • GriPhyN • Petascale Virtual-Data Grids • ATLAS, CMS, LIGO, SDSS • iVDGL • Global Grid lab • ATLAS, CMS, LIGO, SDSS, NVO • Data intensive expts. • Collaborations of physicists & computer scientists • Infrastructure development & deployment • Globus + VDT based = Paul Avery
Why Trillium? • Many common aspects • Large overlap in project leadership • Large overlap in participants • Large overlap in experiments, particularly LHC • Common projects (monitoring, etc.) • Common packaging • Common use of VDT, other GriPhyN software • Funding agencies like collaboration • Good working relationship on grids between NSF and DOE • Good complementarity: DOE (labs), NSF (universities) • Collaboration of computer science/physics/astronomy encouraged • Organization from the “bottom up” • With encouragement from funding agencies Paul Avery
Driven by LHC Computing Challenges • Complexity: Millions of detector channels, complex events • Scale: PetaOps (CPU), Petabytes (Data) • Distribution: Global distribution of people & resources 1800 Physicists 150 Institutes 32 Countries Paul Avery
Korea Russia UK USA Tier2 Center Tier2 Center Tier2 Center Tier2 Center Institute Institute Institute Institute Global LHC Data Grid Experiment (e.g., CMS) Tier0/( Tier1)/( Tier2) ~ 1:1:1 Online System 100-200 MBytes/s CERN Computer Center > 20 TIPS Tier 0 2.5 Gbits/s Tier 1 2.5 Gbits/s Tier 2 ~0.6 Gbits/s Tier 3 0.1 - 1 Gbits/s Physics cache Tier 4 PCs, other portals Paul Avery
Router Data Server LHC Tier2 Center (2001) “Flat” switching topology FEth/GEthSwitch 20-60 nodesDual 0.8-1 GHz, P31 TByte RAID WAN >1 RAID Paul Avery
Switch Switch Router Data Server LHC Tier2 Center (2002-2003) “Hierarchical” switching topology GEthSwitch GEth/FEth Switch GEth/FEth 40-100 nodesDual 2.5 GHz, P42-4 TBytes RAID WAN >1 RAID Paul Avery
LHC Hardware Cost Estimates • Buy late, but not too late: phased implementation • R&D Phase 2001-2004 • Implementation Phase 2004-2007 • R&D to develop capabilities and computing model itself • Prototyping at increasing scales of capability & complexity 1.1 years 1.4 years 2.1 years 1.2 years Paul Avery
Particle Physics Data Grid “In coordination with complementary projects in the US and Europe, PPDG aims to meet the urgent needs for advanced Grid-enabled technology and to strengthen the collaborative foundations of experimental particle and nuclear physics.” Paul Avery
PPDG Goals • Serve high energy & nuclear physics (HENP) experiments • Funded 2001 – 2004 @ US$9.5M (DOE) • Develop advanced Grid technologies • Use Globus to develop higher level tools • Focus on end to end integration • Maintain practical orientation • Networks, instrumentation, monitoring • DB file/object replication, caching, catalogs, end-to-end movement • Serve urgent needs of experiments • Unique challenges, diverse test environments • But make tools general enough for wide community! • Collaboration with GriPhyN, iVDGL, EDG, LCG • Recent work on ESNet Certificate Authority Paul Avery
PPDG Participants and Work Program • Physicist + CS involvement • D0, BaBar, STAR, CMS, ATLAS • SLAC, LBNL, Jlab, FNAL, BNL, Caltech, Wisconsin, Chicago, USC • Computer Science Program of Work • CS1: Job description language • CS2: Schedule, manage data processing, data placement activities • CS3: Monitoring and status reporting (with GriPhyN) • CS4: Storage resource management • CS5: Reliable replication services • CS6: File transfer services • CS7: Collect/document experiment practices generalize • … • CS11: Grid-enabled data analysis Paul Avery
GriPhyN = App. Science + CS + Grids • Participants • US-CMS High Energy Physics • US-ATLAS High Energy Physics • LIGO/LSC Gravity wave research • SDSS Sloan Digital Sky Survey • Strong partnership with computer scientists • Design and implement production-scale grids • Develop common infrastructure, tools and services (Globus based) • Integration into the 4 experiments • Broad application to other sciences via “Virtual Data Toolkit” • Strong outreach program • Funded by NSF for 2000 – 2005 • R&D for grid architecture (funded at $11.9M +$1.6M) • Integrate Grid infrastructure into experiments through VDT Paul Avery
GriPhyN: PetaScale Virtual-Data Grids Production Team Individual Investigator Workgroups ~1 Petaflop ~100 Petabytes Interactive User Tools Request Planning & Request Execution & Virtual Data Tools Management Tools Scheduling Tools Resource Other Grid • Resource • Security and • Other Grid Security and Management • Management • Policy • Services Policy Services Services • Services • Services Services Transforms Distributed resources(code, storage, CPUs,networks) Raw data source Paul Avery
GriPhyN Research Agenda • Based on Virtual Data technologies (fig.) • Derived data, calculable via algorithm • Instantiated 0, 1, or many times (e.g., caches) • “Fetch value” vs “execute algorithm” • Very complex (versions, consistency, cost calculation, etc) • LIGO example • “Get gravitational strain for 2 minutes around each of 200 gamma-ray bursts over the last year” • For each requested data value, need to • Locate item location and algorithm • Determine costs of fetching vs calculating • Plan data movements & computations required to obtain results • Schedule the plan • Execute the plan Paul Avery
Fetch item Virtual Data Concept Major facilities, archives • Data request may • Compute locally • Compute remotely • Access local data • Access remote data • Scheduling based on • Local policies • Global policies • Cost Regional facilities, caches Local facilities, caches Paul Avery
iVDGL: A Global Grid Laboratory “We propose to create, operate and evaluate, over asustained period of time, an international researchlaboratory for data-intensive science.” From NSF proposal, 2001 • International Virtual-Data Grid Laboratory • A global Grid laboratory (US, EU, Asia, South America, …) • A place to conduct Data Grid tests “at scale” • A mechanism to create common Grid infrastructure • A laboratory for other disciplines to perform Data Grid tests • A focus of outreach efforts to small institutions • U.S. part funded by NSF (2001 – 2006) • $14.1M (NSF) + $2M (matching) • International partners bring own funds Paul Avery
iVDGL Participants • Initial experiments (funded by NSF proposal) • CMS, ATLAS, LIGO, SDSS, NVO • Possible other experiments and disciplines • HENP: BTEV, D0, CMS HI, ALICE, … • Non-HEP: Biology, … • Complementary EU project: DataTAG • DataTAG and US pay for 2.5 Gb/s transatlantic network • Additional support from UK e-Science programme • Up to 6 Fellows per year • None hired yet Paul Avery
iVDGL Components • Computing resources • Tier1 laboratory sites (funded elsewhere) • Tier2 university sites software integration • Tier3 university sites outreach effort • Networks • USA (Internet2, ESNet), Europe (Géant, …) • Transatlantic (DataTAG), Transpacific, AMPATH, … • Grid Operations Center (GOC) • Indiana (2 people) • Joint work with TeraGrid on GOC development • Computer Science support teams • Support, test, upgrade GriPhyN Virtual Data Toolkit • Coordination, management Paul Avery
DataTAG TeraGrid EDG LCG? Asia BTEV ALICE Bio Geo ? D0 PDC CMS HI ? iVDGL Management and Coordination U.S. Piece US ProjectDirectors International Piece US External Advisory Committee Collaborating Grid Projects US Project Steering Group Facilities Team Core Software Team Operations Team Project Coordination Group Applications Team GLUE Interoperability Team Outreach Team Paul Avery
iVDGL Work Teams • Facilities Team • Hardware (Tier1, Tier2, Tier3) • Core Software Team • Grid middleware, toolkits • Laboratory Operations Team • Coordination, software support, performance monitoring • Applications Team • High energy physics, gravity waves, virtual astronomy • Nuclear physics, bioinformatics, … • Education and Outreach Team • Web tools, curriculum development, involvement of students • Integrated with GriPhyN, connections to other projects • Want to develop further international connections Paul Avery
Tier1 Tier2 Tier3 US-iVDGL Data Grid (Sep. 2001) SKC Boston U Wisconsin PSU BNL Argonne Fermilab J. Hopkins Indiana Hampton Caltech UCSD/SDSC UF Brownsville Paul Avery
Tier1 Tier2 Tier3 US-iVDGL Data Grid (Dec. 2002) SKC Boston U Wisconsin Michigan PSU BNL Fermilab LBL Argonne J. Hopkins NCSA Indiana Hampton Caltech Oklahoma Vanderbilt UCSD/SDSC FSU Arlington UF FIU Brownsville Paul Avery
Possible iVDGL Participant: TeraGrid 13 TeraFlops Site Resources Site Resources 26 HPSS HPSS 4 24 External Networks External Networks 8 5 Caltech Argonne 40 Gb/s External Networks External Networks NCSA/PACI 8 TF 240 TB SDSC 4.1 TF 225 TB Site Resources Site Resources HPSS UniTree Paul Avery
International Participation • Existing partners • European Data Grid (EDG) • DataTAG • Potential partners • Korea T1 • China T1? • Japan T1? • Brazil T1 • Russia T1 • Chile T2 • Pakistan T2 • Romania ? Paul Avery
Current Trillium Work • Packaging technologies: PACMAN • Used for VDT releases very successful & powerful • Evaluated for Globus, EDG • GriPhyN Virtual Data Toolkit 1.1.3 released • Vastly simplifies installation of grid tools • New changes will further simplify configuration complexity • Monitoring (joint efforts) • Globus MDS 2.2 (GLUE schema) • Caltech MonaLisa • Condor HawkEye • Florida Gossip (low level component) • Chimera Virtual Data System (more later) • Testbeds, demo projects (more later) Paul Avery
Virtual Data: Derivation and Provenance • Most scientific data are not simple “measurements” • They are computationally corrected/reconstructed • They can be produced by numerical simulation • Science & eng. projects are more CPU and data intensive • Programs are significant community resources (transformations) • So are the executions of those programs (derivations) • Management of dataset transformations important! • Derivation: Instantiation of a potential data product • Provenance: Exact history of any existing data product Programs are valuable, like data.They should be community resources Paul Avery
Motivations (1) “I’ve found some interesting data, but I need to know exactly what corrections were applied before I can trust it.” “I’ve detected a mirror calibration error and want to know which derived data products need to be recomputed.” Data consumed-by/ generated-by product-of Derivation Transformation execution-of “I want to search a database for dwarf galaxies. If a program that performs this analysis exists, I won’t have to write one from scratch.” “I want to apply a shape analysis to 10M galaxies. If the results already exist, I’ll save weeks of computation.” Paul Avery
Motivations (2) • Data track-ability and result audit-ability • Universally sought by GriPhyN applications • Facilitates tool and data sharing and collaboration • Data can be sent along with its recipe • Repair and correction of data • Rebuild data products—c.f., “make” • Workflow management • A new, structured paradigm for organizing, locating, specifying, and requesting data products • Performance optimizations • Ability to re-create data rather than move it Paul Avery
“Chimera” Virtual Data System • Virtual Data API • A Java class hierarchy to represent transformations & derivations • Virtual Data Language • Textual for people & illustrative examples • XML for machine-to-machine interfaces • Virtual Data Database • Makes the objects of a virtual data definition persistent • Virtual Data Service (future) • Provides a service interface (e.g., OGSA) to persistent objects Paul Avery
Virtual Data Catalog Object Model Paul Avery
Logical Physical Chimera as a Virtual Data System • Virtual Data Language (VDL) • Describes virtual data products • Virtual Data Catalog (VDC) • Used to store VDL • Abstract Job Flow Planner • Creates a logical DAG (dependency graph) • Concrete Job Flow Planner • Interfaces with a Replica Catalog • Provides a physical DAG submission file to Condor-G • Generic and flexible • As a toolkit and/or a framework • In a Grid environment or locally • Currently in beta version VDC AbstractPlanner XML XML VDL DAX ReplicaCatalog ConcretePlanner DAG DAGMan Paul Avery
Galaxy cluster size distribution Chimera Virtual Data System + GriPhyN Virtual Data Toolkit + iVDGL Data Grid (many CPUs) Chimera Application: SDSS Analysis Size distribution of galaxy clusters? Paul Avery
Wisconsin Fermilab Caltech UCSD Florida US-CMS Testbed Paul Avery
Wisconsin Fermilab Caltech UCSD Florida Other CMS Institutes Encouraged to Join • Expressions of interest • Princeton • Brazil • South Korea • Minnesota • Iowa • Possibly others Paul Avery
Wisconsin Fermilab Caltech UCSD Florida Grid Middleware Used in Testbed • Virtual Data Toolkit 1.1.3 • VDT Client: • Globus Toolkit 2.0 • Condor-G 6.4.3 • VDT Server: • Globus Toolkit 2.0 • mkgridmap • Condor 6.4.3 • ftsh • GDMP 3.0.7 • Virtual Organization (VO) Management • LDAP Server deployed at Fermilab • GroupMAN (adapted from EDG) used to manage the VO • Use D.O.E. Science Grid certificates • Accept EDG and Globus certificates Paul Avery
Commissioning the CMS Grid Testbed • A complete prototype (fig.) • CMS Production Scripts • Globus • Condor-G • GridFTP • Commissioning: Require production quality results! • Run until the Testbed "breaks" • Fix Testbed with middleware patches • Repeat procedure until the entire Production Run finishes! • Discovered/fixed many Globus and Condor-G problems • Huge success from this point of view alone • … but very painful Paul Avery
Remote Site 1 Batch Queue GridFTP Master Site Remote Site 2 Batch Queue DAGMan Condor-G mop_submitter IMPALA GridFTP GridFTP Remote Site N Batch Queue GridFTP CMS Grid Testbed Production Paul Avery
MCRunJob Linker ScriptGenerator Configurator MasterScript "DAGMaker" VDL Requirements MOP MOP Chimera Self Description Production Success on CMS Testbed • Results • 150k events generated, ~200 GB produced • 1.5 weeks continuous running across all 5 testbed sites • 1M event run just started on larger testbed (~30% complete!) Paul Avery
Grid Coordination Efforts • Global Grid Forum (www.gridforum.org) • International forum for general Grid efforts • Many working groups, standards definitions • Next one in Japan, early 2003 • HICB (High energy physics) • Joint development & deployment of Data Grid middleware • GriPhyN, PPDG, iVDGL, EU-DataGrid, LCG, DataTAG, Crossgrid • GLUE effort (joint iVDGL – DataTAG working group) • LCG (LHC Computing Grid Project) • Strong “forcing function” • Large demo projects • IST2002 Copenhagen • Supercomputing 2002 Baltimore • New proposal (joint NSF + Framework 6)? Paul Avery
WorldGrid Demo • Joint Trillium-EDG-DataTAG demo • Resources from both sides in Intercontinental Grid Testbed • Use several visualization tools (Nagios, MapCenter, Ganglia) • Use several monitoring tools (Ganglia, MDS, NetSaint, …) • Applications • CMS: CMKIN, CMSIM • ATLAS: ATLSIM • Submit jobs from US or EU • Jobs can run on any cluster • Shown at IST2002 (Copenhagen) • To be shown at SC2002 (Baltimore) • Brochures now available describing Trillium and demos • I have 10 with me now (2000 just printed) Paul Avery
WorldGrid Paul Avery
Summary • Very good progress on many fronts • Packaging • Testbeds • Major demonstration projects • Current Data Grid projects are providing good experience • Looking to collaborate with more international partners • Testbeds • Monitoring • Deploying VDT more widely • Working towards new proposal • Emphasis on Grid-enabled analysis • Extending Chimera virtual data system to analysis Paul Avery
Grid References • Grid Book • www.mkp.com/grids • Globus • www.globus.org • Global Grid Forum • www.gridforum.org • TeraGrid • www.teragrid.org • EU DataGrid • www.eu-datagrid.org • PPDG • www.ppdg.net • GriPhyN • www.griphyn.org • iVDGL • www.ivdgl.org Paul Avery