530 likes | 665 Views
Trillium and Open Science Grid. 3 rd International Workshop on HEP Data Grids Kyungpook National University Daegu, Korea August 27, 2004. Paul Avery University of Florida avery@phys.ufl.edu. U.S. “Trillium” Grid Consortium. Trillium = PPDG + GriPhyN + iVDGL
E N D
Trillium and Open Science Grid 3rd International Workshop on HEP Data Grids Kyungpook National University Daegu, Korea August 27, 2004 Paul Avery University of Florida avery@phys.ufl.edu Paul Avery
U.S. “Trillium” Grid Consortium • Trillium = PPDG + GriPhyN + iVDGL • Particle Physics Data Grid: $12M (DOE) (1999 – 2004+) • GriPhyN: $12M (NSF) (2000 – 2005) • iVDGL: $14M (NSF) (2001 – 2006) • Basic composition (~150 people) • PPDG: 4 universities, 6 labs • GriPhyN: 12 universities, SDSC, 3 labs • iVDGL: 18 universities, SDSC, 4 labs, foreign partners • Expts: BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO, SDSS/NVO • Complementarity of projects • GriPhyN: CS research, Virtual Data Toolkit (VDT) development • PPDG: “End to end” Grid services, monitoring, analysis • iVDGL: Grid laboratory deployment using VDT • Experiments provide frontier challenges • Unified entity when collaborating internationally Paul Avery
2009 2007 2005 Community growth Data growth 2003 2001 Trillium Science Drivers • Experiments at Large Hadron Collider • 100s of Petabytes 2007 - ? • High Energy & Nuclear Physics expts • ~1 Petabyte (1000 TB) 1997 – present • LIGO (gravity wave search) • 100s of Terabytes 2002 – present • Sloan Digital Sky Survey • 10s of Terabytes 2001 – present Future Grid resources • Massive CPU (PetaOps) • Large distributed datasets (>100PB) • Global communities (1000s) Paul Avery
Trillium: Collaborative Program of Work • Common experiments, leadership, participants • CS research • Workflow, scheduling, virtual data • Common Grid toolkits and packaging • Virtual Data Toolkit (VDT) + Pacman packaging • Common Grid infrastructure: Grid3 • National Grid for testing, development and production • Advanced networking • Ultranet, UltraLight, etc. • Integrated education and outreach effort • + collaboration with outside projects • Unified entity in working with international projects • LCG, EGEE, Asia, South America Paul Avery
Large Hadron Collider (LHC) @ CERN • 27 km Tunnel in Switzerland & France TOTEM CMS ALICE LHCb Search for Origin of Mass & Supersymmetry (2007 – ?) ATLAS Paul Avery
+ 3 EU sites (Cardiff/UK, AEI/Germany) Birmingham• • Cardiff AEI/Golm • * LHO, LLO: observatory sites * LSC - LIGO Scientific Collaboration - iVDGL supported The LIGO Scientific Collaboration (LSC)and the LIGO Grid LIGO Grid: 6 US sites iVDGL has enabled LSC to establish a persistent production grid Paul Avery
Sloan Data Galaxy cluster size distribution Sloan Digital Sky Survey (SDSS)Using Virtual Data in GriPhyN Paul Avery
Transforms Goal: Peta-scale Virtual-Data Gridsfor Global Science Production Team Single Researcher Workgroups Interactive User Tools GriPhyN GriPhyN GriPhyN Request Execution & Management Tools Request Planning &Scheduling Tools Virtual Data Tools ResourceManagementServices Security andPolicyServices Other GridServices • PetaOps • Petabytes • Performance GriPhyN Distributed resources(code, storage, CPUs,networks) Raw datasource Paul Avery
LHC: Petascale Global Science • Complexity: Millions of individual detector channels • Scale: PetaOps (CPU), 100s of Petabytes (Data) • Distribution: Global distribution of people & resources BaBar/D0 Example - 2004 700+ Physicists 100+ Institutes 35+ Countries CMS Example- 2007 5000+ Physicists 250+ Institutes 60+ Countries Paul Avery
Korea Russia UK USA U Florida Caltech UCSD FIU Maryland Iowa Global LHC Data Grid Hierarchy ~10s of Petabytes/yr by 2007-8~1000 Petabytes in < 10 yrs? CMS Experiment Online System 0.1 - 1.5 GBytes/s CERN Computer Center Tier 0 10-40 Gb/s Tier 1 >10 Gb/s Tier 2 2.5-10 Gb/s Tier 3 Physics caches Tier 4 PCs Paul Avery
Tier2 Centers • Tier2 facility • 20 – 40% of Tier1? • “1 FTE support”: commodity CPU & disk, no hierarchical storage • Essential university role in extended computing infrastructure • Validated by 3 years of experience with proto-Tier2 sites • Functions • Perform physics analysis, simulations • Support experiment software • Support smaller institutions • Official role in Grid hierarchy (U.S.) • Sanctioned by MOU with parent organization (ATLAS, CMS, LIGO) • Local P.I. with reporting responsibilities • Selection by collaboration via careful process Paul Avery
Analysis by Globally Distributed Teams • Non-hierarchical: Chaotic analyses + productions • Superimpose significant random data flows Paul Avery
Trillium Grid Tools: Virtual Data Toolkit Use NMI processes later NMI VDT Test Sources (CVS) Build Binaries Build & Test Condor pool (37 computers) Pacman cache Package Patching … RPMs Build Binaries GPT src bundles Build Binaries Test Contributors (VDS, etc.) Paul Avery
Globus Alliance Grid Security Infrastructure (GSI) Job submission (GRAM) Information service (MDS) Data transfer (GridFTP) Replica Location (RLS) Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds EDG & LCG Make Gridmap Cert. Revocation List Updater Glue Schema/Info provider ISI & UC Chimera & related tools Pegasus NCSA MyProxy GSI OpenSSH LBL PyGlobus Netlogger Caltech MonaLisa VDT VDT System Profiler Configuration software Others KX509 (U. Mich.) Virtual Data Toolkit: Tools in VDT 1.1.12 Paul Avery
VDT Growth (1.1.14 Currently) VDT 1.1.8 First real use by LCG VDT 1.1.14 May 10 VDT 1.1.11 Grid3 VDT 1.0 Globus 2.0b Condor 6.3.1 VDT 1.1.7 Switch to Globus 2.2 VDT 1.1.3, 1.1.4 & 1.1.5 pre-SC 2002 Paul Avery
LIGO UCHEP VDT D-Zero VDT ATLAS iVDGL CMS/DPE NPACI % pacman Packaging of Grid Software: Pacman • Language: define software environments • Interpreter: create, install, configure, update, verify environments • LCG/Scram • ATLAS/CMT • CMS DPE/tar/make • LIGO/tar/make • OpenSource/tar/make • Globus/GPT • NPACI/TeraGrid/tar/make • D0/UPS-UPD • Commercial/tar/make Combine and manage software from arbitrary sources. “1 button install”: Reduce burden on administrators % pacman –get iVDGL:Grid3 Remote experts define installation/ config/updating for everyone at once Paul Avery
Trillium Collaborative RelationshipsInternal and External Partner Physics projects Partner Outreach projects Requirements Prototyping & experiments Production Deployment • Other linkages • Work force • CS researchers • Industry Computer Science Research Virtual Data Toolkit Larger Science Community Techniques & software Tech Transfer Globus, Condor, NMI, iVDGL, PPDG, EU DataGrid, LHC Experiments, QuarkNet, CHEPREO U.S.Grids Int’l Outreach Paul Avery
Grid3: An Operational National Grid • 28 sites: Universities + 4 national labs • 2800 CPUs, 400–1300 simultaneous jobs (800 now) • Running since October 2003 • Applications in HEP, LIGO, SDSS, Genomics http://www.ivdgl.org/grid3 Paul Avery
Grid2003 Applications • High energy physics • US-ATLAS analysis (DIAL), • US-ATLAS GEANT3 simulation (GCE) • US-CMS GEANT4 simulation (MOP) • BTeV simulation • Gravity waves • LIGO: blind search for continuous sources • Digital astronomy • SDSS: cluster finding (maxBcg) • Bioinformatics • Bio-molecular analysis (SnB) • Genome analysis (GADU/Gnare) • CS Demonstrators • Job Exerciser, GridFTP, NetLogger-grid2003 Paul Avery
Grid3: Three Months Usage Paul Avery
CMS Production Simulations on Grid3 US-CMS Monte Carlo Simulation Total = 1.5 US-CMS resources Non-USCMS USCMS Paul Avery
ATLAS DC2 Production system 150K jobs total so far (12-24 hrs/job) prodDB AMI dms Don Quijote Windmill super super super super super soap jabber jabber jabber soap LCG exe LCG exe NG exe G3 exe LSF exe Capone Dulcinea Lexor RLS RLS RLS LCG NorduGrid Grid3 LSF 650 jobs running now Paul Avery
Grid2003 Broad Lessons • Careful planning and coordination essential to build Grids • Community investment of time/resources • Operations team needed to operate Grid as a facility • Tools, services, procedures, documentation, organization • Security, account management, multiple organizations • Strategies needed to cope with increasingly large scale • “Interesting” failure modes as scale increases • Delegation of responsibilities to conserve human resources • Project, Virtual Org., Grid service, site, application • Better services, documentation, packaging • Grid2003 experience critical for building “useful” Grids • Frank discussion in “Grid2003 Project Lessons” doc Paul Avery
Grid2003 Lessons: Packaging • VDT + Pacman: Installation and configuration • Simple installation, configuration of Grid tools + applications • Major advances over 14 VDT releases • Why this is critical for us • Uniformity and validation across sites • Lowered barriers to participation more sites! • Reduced FTE overhead & communication traffic • Further work: remote installation Paul Avery
Grid3 Open Science Grid • Build on Grid3 experience • Persistent, production-quality Grid, national + international scope • Continue U.S. leading role in international science • Grid infrastructure for large-scale collaborative scientific research • Create large computing infrastructure • Combine resources at DOE labs and universities to effectively become a single national computing infrastructure for science • Maintain interoperability with LCG (Gagliardi talk) • Provide opportunities for educators and students • Participate in building and exploiting this grid infrastructure • Develop and train scientific and technical workforce • Transform the integration of education and research at all levels http://www.opensciencegrid.org Paul Avery
Roadmap towards Open Science Grid • Iteratively build & extend Grid3, to national infrastructure • Shared resources, benefiting broad set of disciplines • Strong focus on operations • Grid3 Open Science Grid • Contribute US-LHC resources to provide initial federation • US-LHC Tier-1 and Tier-2 centers provide significant resources • Build OSG from laboratories, universities, campus grids, etc. • Argonne, Fermilab, SLAC, Brookhaven, Berkeley Lab, Jeff. Lab • UW Madison, U Florida, Purdue, Chicago, Caltech, Harvard, etc. • Corporations • Further develop OSG • Partnerships and contributions from other sciences, universities • Incorporation of advanced networking • Focus on general services, operations, end-to-end performance Paul Avery
Possible OSG Collaborative Framework Campus, Labs Technical Groups 0…n (small) Service Providers Consortium Board (1) Sites Researchers VO Org Joint committees (0…N small) activity 1 Research Grid Projects activity 1 activity 1 activity 0…N (large) Enterprise Participants provide: resources, management, project steering groups Paul Avery
Open Science Grid Consortium • Join U.S. forces into the Open Science Grid consortium • Application scientists, technology providers, resource owners, ... • Provide a lightweight framework for joint activities... • Coordinating activities that cannot be done within one project • Achieve technical coherence and strategic vision (roadmap) • Enable communication and provide liaising • Reach common decisions when necessary • Create Technical Groups • Propose, organize, oversee activities, peer, liaise • Now: Security and storage services • Later: Support centers, policies • Define activities • Contributing tasks provided by participants joining the activity Paul Avery
Applications, Infrastructure, Facilities Paul Avery
Open Science Grid Partnerships Paul Avery
Partnerships for Open Science Grid • Complex matrix of partnerships: challenges & opportunities • NSF and DOE, Universities and Labs, physics and computing depts. • U.S. LHC and Trillium Grid Projects • Application sciences, computer sciences, information technologies • Science and education • U.S., Europe, Asia, North and South America • Grid middleware and infrastructure projects • CERN and U.S. HEP labs, LCG and Experiments, EGEE and OSG • ~85 participants on OSG stakeholder mail list • ~50 authors on OSG abstract to CHEP conference (Sep. 2004) Paul Avery
Open Science Grid Meetings • Sep. 17, 2003 @ NSF • Early discussions with other HEP people • Strong interest of NSF education people • Jan. 12, 2004 @ Fermilab • Initial stakeholders meeting, 1st discussion of governance • May 20-21, 2004 @ Univ. of Chicago • Joint Trillium Steering meeting to define OSG program • July 2004 @ Wisconsin • First attempt to define OSG Blueprint (document) • Sep. 7-8, 2004 @ MIT • 2nd Blueprint meeting • Sep. 9-10, 2004 @ Harvard • Major OSG workshop • Three sessions: (1) Technical, (2) Governance, (3) Sciences Paul Avery
Education and Outreach Paul Avery
Grids and the Digital DivideRio de Janeiro, Feb. 16-20, 2004 NEWS:Bulletin: ONE TWOWELCOME BULLETIN General InformationRegistrationTravel InformationHotel Registration Participant List How to Get UERJ/HotelComputer Accounts Useful Phone Numbers Program Contact us: SecretariatChairmen Background • World Summit on Information Society • HEP Standing Committee on Inter-regional Connectivity (SCIC) Themes • Global collaborations, Grids and addressing the Digital Divide Next meeting: May 2005 (Korea) http://www.uerj.br/lishep2004 Paul Avery
iVDGL, GriPhyN Education / Outreach Basics • $200K/yr • Led by UT Brownsville • Workshops, portals • Partnerships with CHEPREO, QuarkNet, … Paul Avery
June 21-25 Grid Summer School • First of its kind in the U.S. (South Padre Island, Texas) • 36 students, diverse origins and types (M, F, MSIs, etc) • Marks new direction for Trillium • First attempt to systematically train people in Grid technologies • First attempt to gather relevant materials in one place • Today: Students in CS and Physics • Later: Students, postdocs, junior & senior scientists • Reaching a wider audience • Put lectures, exercises, video, on the web • More tutorials, perhaps 3-4/year • Dedicated resources for remote tutorials • Create “Grid book”, e.g. Georgia Tech • New funding opportunities • NSF: new training & education programs Paul Avery
CHEPREO: Center for High Energy Physics Research and Educational OutreachFlorida International University • Physics Learning Center • CMS Research • iVDGL Grid Activities • AMPATH network (S. America) • Funded September 2003 • $4M initially (3 years) • 4 NSF Directorates!
UUEO: A New Initiative • Meeting April 8 in Washington DC • Brought together ~40 outreach leaders (including NSF) • Proposed Grid-based framework for common E/O effort Paul Avery
Grid Project References • Grid3 • www.ivdgl.org/grid3 • PPDG • www.ppdg.net • GriPhyN • www.griphyn.org • iVDGL • www.ivdgl.org • Globus • www.globus.org • LCG • www.cern.ch/lcg • EU DataGrid • www.eu-datagrid.org • EGEE • egee-ei.web.cern.ch 2nd Edition www.mkp.com/grid2 Paul Avery
Extra Slides Paul Avery
OutreachQuarkNet-Trillium Virtual Data Portal • More than a web site • Organize datasets • Perform simple computations • Create new computations & analyses • View & share results • Annotate & enquire (metadata) • Communicate and collaborate • Easy to use, ubiquitous, • No tools to install • Open to the community • Grow & extend Initial prototype implemented by graduate student Yong Zhao and M. Wilde (U. of Chicago) Paul Avery
GriPhyN Achievements • Virtual Data paradigm to express science processes • Unified language (VDL) to express general data transformation • Advanced planners, executors, monitors, predictors, fault recovery to make the Grid “like a workstation” • Virtual Data Toolkit (VDT) • Tremendously simplified installation & configuration of Grids • Close partnership with and adoption by multiple sciences:ATLAS, CMS, LIGO, SDSS, Bioinformatics, EU Projects • Broad education & outreach program (UT Brownsville) • 25 graduate, 2 undergraduate; 3 CS PhDs by end of 2004 • Virtual Data for QuarkNet Cosmic Ray project • Grid Summer School 2004, 3 MSIs participating Paul Avery
Grid2003 Lessons (1) • Need for financial investment • Grid projects: PPDG + GriPhyN + iVDGL: $35M • National labs: Fermilab, Brookhaven, Argonne, LBL • Other expts: LIGO, SDSS • Critical role of collaboration • CS + 5 HEP + LIGO + SDSS + CS + biology • Makes possible large common effort • Collaboration sociology ~50% of issues • Building “stuff”: Prototypes testbeds production Grids • Build expertise, debug software, develop tools & services • Need for “operations team” • Point of contact for problems, questions, outside inquiries • Resolution of problems by direct action, consultation w/ experts Paul Avery
Grid2003 Lessons (2): Packaging • VDT + Pacman: Installation and configuration • Simple installation, configuration of Grid tools + applications • Major advances over 14 VDT releases • Why this is critical for us • Uniformity and validation across sites • Lowered barriers to participation more sites! • Reduced FTE overhead & communication traffic • The next frontier: remote installation Paul Avery
Grid2003 and Beyond • Further evolution of Grid3: (Grid3+, etc.) • Contribute resources to persistent Grid • Maintain development Grid, test new software releases • Integrate software into the persistent Grid • Participate in LHC data challenges • Involvement of new sites • New institutions and experiments • New international partners (Brazil, Taiwan, Pakistan?, …) • Improvements in Grid middleware and services • Storage services • Integrating multiple VOs • Monitoring • Troubleshooting • Accounting Paul Avery
Tier1 Tier2 Other International Virtual Data Grid Laboratory SKC Boston U Buffalo UW Milwaukee Michigan UW Madison BNL Fermilab LBL Argonne PSU Iowa Chicago J. Hopkins Indiana Hampton Caltech ISI Vanderbilt UCSD Partners • EU • Brazil • Korea UF Austin FIU Brownsville Paul Avery
Roles of iVDGL Institutions • U Florida CMS (Tier2), Management • Caltech CMS (Tier2), LIGO (Management) • UC San Diego CMS (Tier2), CS • Indiana U ATLAS (Tier2), iGOC (operations) • Boston U ATLAS (Tier2) • Harvard ATLAS (Management) • Wisconsin, Milwaukee LIGO (Tier2) • Penn State LIGO (Tier2) • Johns Hopkins SDSS (Tier2), NVO • ChicagoCS, Coord./Management, ATLAS (Tier2) • Vanderbilt* BTEV (Tier2, unfunded) • Southern California CS • Wisconsin, Madison CS • Texas, Austin CS • Salish Kootenai LIGO (Outreach, Tier3) • Hampton U ATLAS (Outreach, Tier3) • Texas, Brownsville LIGO (Outreach, Tier3) • Fermilab CMS (Tier1), SDSS, NVO • Brookhaven ATLAS (Tier1) • Argonne ATLAS (Management), Coordination Paul Avery
“Virtual Data”: Derivation & Provenance • Most scientific data are not simple “measurements” • They are computationally corrected/reconstructed • They can be produced by numerical simulation • Science & eng. projects are more CPU and data intensive • Programs are significant community resources (transformations) • So are the executions of those programs (derivations) • Management of dataset dependencies critical! • Derivation: Instantiation of a potential data product • Provenance: Complete history of any existing data product • Previously: Manual methods • GriPhyN: Automated, robust tools Paul Avery