1 / 53

Paul Avery University of Florida avery@phys.ufl

Trillium and Open Science Grid. 3 rd International Workshop on HEP Data Grids Kyungpook National University Daegu, Korea August 27, 2004. Paul Avery University of Florida avery@phys.ufl.edu. U.S. “Trillium” Grid Consortium. Trillium = PPDG + GriPhyN + iVDGL

margie
Download Presentation

Paul Avery University of Florida avery@phys.ufl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Trillium and Open Science Grid 3rd International Workshop on HEP Data Grids Kyungpook National University Daegu, Korea August 27, 2004 Paul Avery University of Florida avery@phys.ufl.edu Paul Avery

  2. U.S. “Trillium” Grid Consortium • Trillium = PPDG + GriPhyN + iVDGL • Particle Physics Data Grid: $12M (DOE) (1999 – 2004+) • GriPhyN: $12M (NSF) (2000 – 2005) • iVDGL: $14M (NSF) (2001 – 2006) • Basic composition (~150 people) • PPDG: 4 universities, 6 labs • GriPhyN: 12 universities, SDSC, 3 labs • iVDGL: 18 universities, SDSC, 4 labs, foreign partners • Expts: BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO, SDSS/NVO • Complementarity of projects • GriPhyN: CS research, Virtual Data Toolkit (VDT) development • PPDG: “End to end” Grid services, monitoring, analysis • iVDGL: Grid laboratory deployment using VDT • Experiments provide frontier challenges • Unified entity when collaborating internationally Paul Avery

  3. 2009 2007 2005 Community growth Data growth 2003 2001 Trillium Science Drivers • Experiments at Large Hadron Collider • 100s of Petabytes 2007 - ? • High Energy & Nuclear Physics expts • ~1 Petabyte (1000 TB) 1997 – present • LIGO (gravity wave search) • 100s of Terabytes 2002 – present • Sloan Digital Sky Survey • 10s of Terabytes 2001 – present Future Grid resources • Massive CPU (PetaOps) • Large distributed datasets (>100PB) • Global communities (1000s) Paul Avery

  4. Trillium: Collaborative Program of Work • Common experiments, leadership, participants • CS research • Workflow, scheduling, virtual data • Common Grid toolkits and packaging • Virtual Data Toolkit (VDT) + Pacman packaging • Common Grid infrastructure: Grid3 • National Grid for testing, development and production • Advanced networking • Ultranet, UltraLight, etc. • Integrated education and outreach effort • + collaboration with outside projects • Unified entity in working with international projects • LCG, EGEE, Asia, South America Paul Avery

  5. Large Hadron Collider (LHC) @ CERN • 27 km Tunnel in Switzerland & France TOTEM CMS ALICE LHCb Search for Origin of Mass & Supersymmetry (2007 – ?) ATLAS Paul Avery

  6. + 3 EU sites (Cardiff/UK, AEI/Germany) Birmingham• • Cardiff AEI/Golm • * LHO, LLO: observatory sites * LSC - LIGO Scientific Collaboration - iVDGL supported The LIGO Scientific Collaboration (LSC)and the LIGO Grid LIGO Grid: 6 US sites iVDGL has enabled LSC to establish a persistent production grid Paul Avery

  7. Sloan Data Galaxy cluster size distribution Sloan Digital Sky Survey (SDSS)Using Virtual Data in GriPhyN Paul Avery

  8. Transforms Goal: Peta-scale Virtual-Data Gridsfor Global Science Production Team Single Researcher Workgroups Interactive User Tools GriPhyN GriPhyN GriPhyN Request Execution & Management Tools Request Planning &Scheduling Tools Virtual Data Tools ResourceManagementServices Security andPolicyServices Other GridServices • PetaOps • Petabytes • Performance GriPhyN Distributed resources(code, storage, CPUs,networks) Raw datasource Paul Avery

  9. LHC: Petascale Global Science • Complexity: Millions of individual detector channels • Scale: PetaOps (CPU), 100s of Petabytes (Data) • Distribution: Global distribution of people & resources BaBar/D0 Example - 2004 700+ Physicists 100+ Institutes 35+ Countries CMS Example- 2007 5000+ Physicists 250+ Institutes 60+ Countries Paul Avery

  10. Korea Russia UK USA U Florida Caltech UCSD FIU Maryland Iowa Global LHC Data Grid Hierarchy ~10s of Petabytes/yr by 2007-8~1000 Petabytes in < 10 yrs? CMS Experiment Online System 0.1 - 1.5 GBytes/s CERN Computer Center Tier 0 10-40 Gb/s Tier 1 >10 Gb/s Tier 2 2.5-10 Gb/s Tier 3 Physics caches Tier 4 PCs Paul Avery

  11. Tier2 Centers • Tier2 facility • 20 – 40% of Tier1? • “1 FTE support”: commodity CPU & disk, no hierarchical storage • Essential university role in extended computing infrastructure • Validated by 3 years of experience with proto-Tier2 sites • Functions • Perform physics analysis, simulations • Support experiment software • Support smaller institutions • Official role in Grid hierarchy (U.S.) • Sanctioned by MOU with parent organization (ATLAS, CMS, LIGO) • Local P.I. with reporting responsibilities • Selection by collaboration via careful process Paul Avery

  12. Analysis by Globally Distributed Teams • Non-hierarchical: Chaotic analyses + productions • Superimpose significant random data flows Paul Avery

  13. Trillium Grid Tools: Virtual Data Toolkit Use NMI processes later NMI VDT Test Sources (CVS) Build Binaries Build & Test Condor pool (37 computers) Pacman cache Package Patching … RPMs Build Binaries GPT src bundles Build Binaries Test Contributors (VDS, etc.) Paul Avery

  14. Globus Alliance Grid Security Infrastructure (GSI) Job submission (GRAM) Information service (MDS) Data transfer (GridFTP) Replica Location (RLS) Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds EDG & LCG Make Gridmap Cert. Revocation List Updater Glue Schema/Info provider ISI & UC Chimera & related tools Pegasus NCSA MyProxy GSI OpenSSH LBL PyGlobus Netlogger Caltech MonaLisa VDT VDT System Profiler Configuration software Others KX509 (U. Mich.) Virtual Data Toolkit: Tools in VDT 1.1.12 Paul Avery

  15. VDT Growth (1.1.14 Currently) VDT 1.1.8 First real use by LCG VDT 1.1.14 May 10 VDT 1.1.11 Grid3 VDT 1.0 Globus 2.0b Condor 6.3.1 VDT 1.1.7 Switch to Globus 2.2 VDT 1.1.3, 1.1.4 & 1.1.5 pre-SC 2002 Paul Avery

  16. LIGO UCHEP VDT D-Zero VDT ATLAS iVDGL CMS/DPE NPACI % pacman Packaging of Grid Software: Pacman • Language: define software environments • Interpreter: create, install, configure, update, verify environments • LCG/Scram • ATLAS/CMT • CMS DPE/tar/make • LIGO/tar/make • OpenSource/tar/make • Globus/GPT • NPACI/TeraGrid/tar/make • D0/UPS-UPD • Commercial/tar/make Combine and manage software from arbitrary sources. “1 button install”: Reduce burden on administrators % pacman –get iVDGL:Grid3 Remote experts define installation/ config/updating for everyone at once Paul Avery

  17. Trillium Collaborative RelationshipsInternal and External Partner Physics projects Partner Outreach projects Requirements Prototyping & experiments Production Deployment • Other linkages • Work force • CS researchers • Industry Computer Science Research Virtual Data Toolkit Larger Science Community Techniques & software Tech Transfer Globus, Condor, NMI, iVDGL, PPDG, EU DataGrid, LHC Experiments, QuarkNet, CHEPREO U.S.Grids Int’l Outreach Paul Avery

  18. Grid3: An Operational National Grid • 28 sites: Universities + 4 national labs • 2800 CPUs, 400–1300 simultaneous jobs (800 now) • Running since October 2003 • Applications in HEP, LIGO, SDSS, Genomics http://www.ivdgl.org/grid3 Paul Avery

  19. Grid2003 Applications • High energy physics • US-ATLAS analysis (DIAL), • US-ATLAS GEANT3 simulation (GCE) • US-CMS GEANT4 simulation (MOP) • BTeV simulation • Gravity waves • LIGO: blind search for continuous sources • Digital astronomy • SDSS: cluster finding (maxBcg) • Bioinformatics • Bio-molecular analysis (SnB) • Genome analysis (GADU/Gnare) • CS Demonstrators • Job Exerciser, GridFTP, NetLogger-grid2003 Paul Avery

  20. Grid3: Three Months Usage Paul Avery

  21. CMS Production Simulations on Grid3 US-CMS Monte Carlo Simulation Total = 1.5  US-CMS resources Non-USCMS USCMS Paul Avery

  22. ATLAS DC2 Production system 150K jobs total so far (12-24 hrs/job) prodDB AMI dms Don Quijote Windmill super super super super super soap jabber jabber jabber soap LCG exe LCG exe NG exe G3 exe LSF exe Capone Dulcinea Lexor RLS RLS RLS LCG NorduGrid Grid3 LSF 650 jobs running now Paul Avery

  23. Grid2003 Broad Lessons • Careful planning and coordination essential to build Grids • Community investment of time/resources • Operations team needed to operate Grid as a facility • Tools, services, procedures, documentation, organization • Security, account management, multiple organizations • Strategies needed to cope with increasingly large scale • “Interesting” failure modes as scale increases • Delegation of responsibilities to conserve human resources • Project, Virtual Org., Grid service, site, application • Better services, documentation, packaging • Grid2003 experience critical for building “useful” Grids • Frank discussion in “Grid2003 Project Lessons” doc Paul Avery

  24. Grid2003 Lessons: Packaging • VDT + Pacman: Installation and configuration • Simple installation, configuration of Grid tools + applications • Major advances over 14 VDT releases • Why this is critical for us • Uniformity and validation across sites • Lowered barriers to participation  more sites! • Reduced FTE overhead & communication traffic • Further work: remote installation Paul Avery

  25. Grid3  Open Science Grid • Build on Grid3 experience • Persistent, production-quality Grid, national + international scope • Continue U.S. leading role in international science • Grid infrastructure for large-scale collaborative scientific research • Create large computing infrastructure • Combine resources at DOE labs and universities to effectively become a single national computing infrastructure for science • Maintain interoperability with LCG (Gagliardi talk) • Provide opportunities for educators and students • Participate in building and exploiting this grid infrastructure • Develop and train scientific and technical workforce • Transform the integration of education and research at all levels http://www.opensciencegrid.org Paul Avery

  26. Roadmap towards Open Science Grid • Iteratively build & extend Grid3, to national infrastructure • Shared resources, benefiting broad set of disciplines • Strong focus on operations • Grid3  Open Science Grid • Contribute US-LHC resources to provide initial federation • US-LHC Tier-1 and Tier-2 centers provide significant resources • Build OSG from laboratories, universities, campus grids, etc. • Argonne, Fermilab, SLAC, Brookhaven, Berkeley Lab, Jeff. Lab • UW Madison, U Florida, Purdue, Chicago, Caltech, Harvard, etc. • Corporations • Further develop OSG • Partnerships and contributions from other sciences, universities • Incorporation of advanced networking • Focus on general services, operations, end-to-end performance Paul Avery

  27. Possible OSG Collaborative Framework Campus, Labs Technical Groups 0…n (small) Service Providers Consortium Board (1) Sites Researchers VO Org Joint committees (0…N small) activity 1 Research Grid Projects activity 1 activity 1 activity 0…N (large) Enterprise Participants provide: resources, management, project steering groups Paul Avery

  28. Open Science Grid Consortium • Join U.S. forces into the Open Science Grid consortium • Application scientists, technology providers, resource owners, ... • Provide a lightweight framework for joint activities... • Coordinating activities that cannot be done within one project • Achieve technical coherence and strategic vision (roadmap) • Enable communication and provide liaising • Reach common decisions when necessary • Create Technical Groups • Propose, organize, oversee activities, peer, liaise • Now: Security and storage services • Later: Support centers, policies • Define activities • Contributing tasks provided by participants joining the activity Paul Avery

  29. Applications, Infrastructure, Facilities Paul Avery

  30. Open Science Grid Partnerships Paul Avery

  31. Partnerships for Open Science Grid • Complex matrix of partnerships: challenges & opportunities • NSF and DOE, Universities and Labs, physics and computing depts. • U.S. LHC and Trillium Grid Projects • Application sciences, computer sciences, information technologies • Science and education • U.S., Europe, Asia, North and South America • Grid middleware and infrastructure projects • CERN and U.S. HEP labs, LCG and Experiments, EGEE and OSG • ~85 participants on OSG stakeholder mail list • ~50 authors on OSG abstract to CHEP conference (Sep. 2004) Paul Avery

  32. Open Science Grid Meetings • Sep. 17, 2003 @ NSF • Early discussions with other HEP people • Strong interest of NSF education people • Jan. 12, 2004 @ Fermilab • Initial stakeholders meeting, 1st discussion of governance • May 20-21, 2004 @ Univ. of Chicago • Joint Trillium Steering meeting to define OSG program • July 2004 @ Wisconsin • First attempt to define OSG Blueprint (document) • Sep. 7-8, 2004 @ MIT • 2nd Blueprint meeting • Sep. 9-10, 2004 @ Harvard • Major OSG workshop • Three sessions: (1) Technical, (2) Governance, (3) Sciences Paul Avery

  33. Paul Avery

  34. Education and Outreach Paul Avery

  35. Grids and the Digital DivideRio de Janeiro, Feb. 16-20, 2004 NEWS:Bulletin: ONE TWOWELCOME BULLETIN General InformationRegistrationTravel InformationHotel Registration Participant List How to Get UERJ/HotelComputer Accounts Useful Phone Numbers Program Contact us: SecretariatChairmen Background • World Summit on Information Society • HEP Standing Committee on Inter-regional Connectivity (SCIC) Themes • Global collaborations, Grids and addressing the Digital Divide Next meeting: May 2005 (Korea) http://www.uerj.br/lishep2004 Paul Avery

  36. iVDGL, GriPhyN Education / Outreach Basics • $200K/yr • Led by UT Brownsville • Workshops, portals • Partnerships with CHEPREO, QuarkNet, … Paul Avery

  37. June 21-25 Grid Summer School • First of its kind in the U.S. (South Padre Island, Texas) • 36 students, diverse origins and types (M, F, MSIs, etc) • Marks new direction for Trillium • First attempt to systematically train people in Grid technologies • First attempt to gather relevant materials in one place • Today: Students in CS and Physics • Later: Students, postdocs, junior & senior scientists • Reaching a wider audience • Put lectures, exercises, video, on the web • More tutorials, perhaps 3-4/year • Dedicated resources for remote tutorials • Create “Grid book”, e.g. Georgia Tech • New funding opportunities • NSF: new training & education programs Paul Avery

  38. Paul Avery

  39. CHEPREO: Center for High Energy Physics Research and Educational OutreachFlorida International University • Physics Learning Center • CMS Research • iVDGL Grid Activities • AMPATH network (S. America) • Funded September 2003 • $4M initially (3 years) • 4 NSF Directorates!

  40. UUEO: A New Initiative • Meeting April 8 in Washington DC • Brought together ~40 outreach leaders (including NSF) • Proposed Grid-based framework for common E/O effort Paul Avery

  41. Grid Project References • Grid3 • www.ivdgl.org/grid3 • PPDG • www.ppdg.net • GriPhyN • www.griphyn.org • iVDGL • www.ivdgl.org • Globus • www.globus.org • LCG • www.cern.ch/lcg • EU DataGrid • www.eu-datagrid.org • EGEE • egee-ei.web.cern.ch 2nd Edition www.mkp.com/grid2 Paul Avery

  42. Extra Slides Paul Avery

  43. OutreachQuarkNet-Trillium Virtual Data Portal • More than a web site • Organize datasets • Perform simple computations • Create new computations & analyses • View & share results • Annotate & enquire (metadata) • Communicate and collaborate • Easy to use, ubiquitous, • No tools to install • Open to the community • Grow & extend Initial prototype implemented by graduate student Yong Zhao and M. Wilde (U. of Chicago) Paul Avery

  44. GriPhyN Achievements • Virtual Data paradigm to express science processes • Unified language (VDL) to express general data transformation • Advanced planners, executors, monitors, predictors, fault recovery to make the Grid “like a workstation” • Virtual Data Toolkit (VDT) • Tremendously simplified installation & configuration of Grids • Close partnership with and adoption by multiple sciences:ATLAS, CMS, LIGO, SDSS, Bioinformatics, EU Projects • Broad education & outreach program (UT Brownsville) • 25 graduate, 2 undergraduate; 3 CS PhDs by end of 2004 • Virtual Data for QuarkNet Cosmic Ray project • Grid Summer School 2004, 3 MSIs participating Paul Avery

  45. Grid2003 Lessons (1) • Need for financial investment • Grid projects: PPDG + GriPhyN + iVDGL: $35M • National labs: Fermilab, Brookhaven, Argonne, LBL • Other expts: LIGO, SDSS • Critical role of collaboration • CS + 5 HEP + LIGO + SDSS + CS + biology • Makes possible large common effort • Collaboration sociology ~50% of issues • Building “stuff”: Prototypes  testbeds  production Grids • Build expertise, debug software, develop tools & services • Need for “operations team” • Point of contact for problems, questions, outside inquiries • Resolution of problems by direct action, consultation w/ experts Paul Avery

  46. Grid2003 Lessons (2): Packaging • VDT + Pacman: Installation and configuration • Simple installation, configuration of Grid tools + applications • Major advances over 14 VDT releases • Why this is critical for us • Uniformity and validation across sites • Lowered barriers to participation  more sites! • Reduced FTE overhead & communication traffic • The next frontier: remote installation Paul Avery

  47. Grid2003 and Beyond • Further evolution of Grid3: (Grid3+, etc.) • Contribute resources to persistent Grid • Maintain development Grid, test new software releases • Integrate software into the persistent Grid • Participate in LHC data challenges • Involvement of new sites • New institutions and experiments • New international partners (Brazil, Taiwan, Pakistan?, …) • Improvements in Grid middleware and services • Storage services • Integrating multiple VOs • Monitoring • Troubleshooting • Accounting Paul Avery

  48. Tier1 Tier2 Other International Virtual Data Grid Laboratory SKC Boston U Buffalo UW Milwaukee Michigan UW Madison BNL Fermilab LBL Argonne PSU Iowa Chicago J. Hopkins Indiana Hampton Caltech ISI Vanderbilt UCSD Partners • EU • Brazil • Korea UF Austin FIU Brownsville Paul Avery

  49. Roles of iVDGL Institutions • U Florida CMS (Tier2), Management • Caltech CMS (Tier2), LIGO (Management) • UC San Diego CMS (Tier2), CS • Indiana U ATLAS (Tier2), iGOC (operations) • Boston U ATLAS (Tier2) • Harvard ATLAS (Management) • Wisconsin, Milwaukee LIGO (Tier2) • Penn State LIGO (Tier2) • Johns Hopkins SDSS (Tier2), NVO • ChicagoCS, Coord./Management, ATLAS (Tier2) • Vanderbilt* BTEV (Tier2, unfunded) • Southern California CS • Wisconsin, Madison CS • Texas, Austin CS • Salish Kootenai LIGO (Outreach, Tier3) • Hampton U ATLAS (Outreach, Tier3) • Texas, Brownsville LIGO (Outreach, Tier3) • Fermilab CMS (Tier1), SDSS, NVO • Brookhaven ATLAS (Tier1) • Argonne ATLAS (Management), Coordination Paul Avery

  50. “Virtual Data”: Derivation & Provenance • Most scientific data are not simple “measurements” • They are computationally corrected/reconstructed • They can be produced by numerical simulation • Science & eng. projects are more CPU and data intensive • Programs are significant community resources (transformations) • So are the executions of those programs (derivations) • Management of dataset dependencies critical! • Derivation: Instantiation of a potential data product • Provenance: Complete history of any existing data product • Previously: Manual methods • GriPhyN: Automated, robust tools Paul Avery

More Related