400 likes | 529 Views
HENP DATA GRIDS and STARTAP Worldwide Analysis at Regional Centers Harvey B. Newman (Caltech) HPIIS Review San Diego, October 25, 2000 http://l3www.cern.ch/~newman/hpiis2000.ppt. Next Generation Experiments: Physics and Technical Goals.
E N D
HENP DATA GRIDS and STARTAP • Worldwide Analysis at Regional CentersHarvey B. Newman (Caltech) • HPIIS Review • San Diego, October 25, 2000 • http://l3www.cern.ch/~newman/hpiis2000.ppt
Next Generation Experiments: Physics and Technical Goals • The extraction of small or subtle new “discovery” signals from large and potentially overwhelming backgrounds; or “precision” analysis of large samples • Providing rapid access to event samples and subsets from massive data stores, from ~300 Terabytes in 2001 Petabytes by ~2003, ~10 Petabytes by 2006, to ~100 Petabytes by ~2010. • Providing analyzed results with rapid turnaround, bycoordinating and managing the LIMITED computing, data handling and network resources effectively • Enabling rapid access to the data and the collaboration, across an ensemble of networks of varying capability, using heterogeneous resources.
The Large Hadron Collider (2005-) • A next-generation particle collider • the largest superconductor installation in the world • A bunch-bunch collision will take place every 25 nanoseconds: each generating ~20 interactions • But only one in a trillion may lead to a major physics discovery • Real-time data filtering: Petabytes per second to Gigabytes per second • Accumulated data of many Petabytes/Year • Large data samples explored and analyzed by thousands of geographically dispersed scientists, in hundreds of teams
Computing Challenges: LHC Example • Geographical dispersion: of people and resources • Complexity: the detector and the LHC environment • Scale: Tens of Petabytes per year of data 1800 Physicists 150 Institutes 34 Countries • Major challenges associated with: • Communication and collaboration at a distance • Network-distributed computing and data resources • Remote software development and physics analysis • R&D: New Forms of Distributed Systems: Data Grids
Four LHC Experiments: The Petabyte to Exabyte Challenge • ATLAS, CMS, ALICE, LHCBHiggs + New particles; Quark-Gluon Plasma; CP Violation • Data written to tape ~25 Petabytes/Year and UP; 0.25 Petaflops and UP • 0.1 to 1 Exabyte (1 EB = 1018 Bytes) (~2010) (~2015 ?) Total for the LHC Experiments
Higgs Search LEPC September 2000
LHC: Higgs Decay into 4 muons (tracker only); 1000X LEP Data Rate 109 events/sec, selectivity: 1 in 1013 (1 person in a thousand world populations)
On-line Filter System • Large variety of triggers and thresholds: select physics à la carte • Multi-level trigger • Filter out less interestingevents • Online reduction 107 • Keep highly selected events • Result: Petabytesof Binary CompactData Per Year 40 MHz (1000 TB/sec) equivalent) Level 1 - Special Hardware 75 KHz (75 GB/sec)fully digitised Level 2 - Processors 5 KHz(5 GB/sec) Level 3 – Farm of Commodity CPUs 100 Hz(100 MB/sec) Data Recording & Offline Analysis
Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center HPSS HPSS HPSS HPSS LHC Vision: Data Grid Hierarchy ~PByte/sec ~100 MBytes/sec Online System Experiment Offline Farm,CERN Computer Ctr > 20 TIPS Tier 0 +1 HPSS ~0.6-2.5 Gbits/sec Tier 1 FNAL Center Italy Center UK Center FranceCentre ~2.5 Gbits/sec Tier 2 ~622 Mbits/sec Tier 3 Institute ~0.25TIPS Institute Institute Institute Physicists work on analysis “channels” Each institute has ~10 physicists working on one or more channels 100 - 1000 Mbits/sec Physics data cache Tier 4 Workstations
Why Worldwide Computing? Regional Center Concept Advantages • Managed, fair-shared access for Physicists everywhere • Maximize total funding resources while meeting the total computing and data handling needs • Balance between proximity of datasets to appropriate resources, and to the users • Tier-N Model • Efficient use of network: higher throughput • Per Flow: Local > regional > national > international • Utilizing all intellectual resources, in several time zones • CERN, national labs, universities, remote sites • Involving physicists and students at their home institutions • Greater flexibility to pursue different physics interests, priorities, and resource allocation strategies by region • And/or by Common Interests (physics topics, subdetectors,…) • Manage the System’s Complexity • Partitioning facility tasks, to manage and focus resources
Grid Services Architecture [*] A Rich Set of HEP Data-Analysis Related Applications Applns Remote data toolkit Remote comp. toolkit Remote viz toolkit Remote collab. toolkit Remote sensors toolkit Appln Toolkits ... Grid Services Protocols, authentication, policy, resource management, instrumentation, discovery,etc. Data stores, networks, computers, display devices,… ; associated local services Grid Fabric [*] Adapted from Ian Foster
SDSS Data Grid (In GriPhyN): A Shared Vision • Three main functions: • Raw data processing on a Grid (FNAL) • Rapid turnaround with TBs of data • Accessible storage of all image data • Fast science analysis environment (JHU) • Combined data access + analysis of calibrated data • Distributed I/O layer and processing layer; shared by whole collaboration • Public data access • SDSS data browsing for astronomers, and students • Complex query engine for the public
Roles of Projectsfor HENP Distributed Analysis • RD45, GIOD Networked Object Databases • Clipper/GC High speed access to Objects or File data FNAL/SAM for processing and analysis • SLAC/OOFS Distributed File System + Objectivity Interface • NILE, Condor: Fault Tolerant Distributed Computing • MONARC LHC Computing Models: Architecture, Simulation, Strategy, Politics • ALDAP OO Database Structures & Access Methods for Astrophysics and HENP Data • PPDG First Distributed Data Services and Data Grid System Prototype • GriPhyN Production-Scale Data Grids • EU Data Grid
GIOD: Globally InterconnectedObject Databases • MultiTB OO Database Federation; used across LANs and WANs • 170 MByte/sec CMS Milestone • Developed Java 3D OO Reconstruction, Analysis and Visualization Prototypes that Work Seamlessly OverWorldwide Networks • Deployed facilities and database federations as testbedsfor Computing Model studies Hit Track Detector
University CPU, Disk, Users University CPU, Disk, Users Satellite Site Tape, CPU, Disk, Robot University CPU, Disk, Users PRIMARY SITE DAQ, Tape, CPU, Disk, Robot University CPU, Disk, Users University CPU, Disk, Users Satellite Site Tape, CPU, Disk, Robot The Particle Physics Data Grid (PPDG) ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC, U.Wisc/CS Site to Site Data Replication Service 100 Mbytes/sec PRIMARY SITE Data Acquisition, CPU, Disk, Tape Robot SECONDARY SITE CPU, Disk, Tape Robot • First Round Goal: Optimized cached read access to 10-100 Gbytes drawn from a total data set of 0.1 to ~1 Petabyte Multi-Site Cached File Access Service • Matchmaking, Co-Scheduling: SRB, Condor, Globus services; HRM, NWS
Request Planner(Matchmaking) Request Interpreter Request Executor PPDG WG1: Request Manager REQUEST MANAGER CLIENT CLIENT Logical Request Event-file Index Replica catalog Logical Set of Files Disk Cache DRM Network Weather Service Physical file transfer requests GRID DRM HRM Disk Cache Disk Cache tape system
ANL GSI-wuftpd ISI GSI-wuftpd Disk Disk Earth Grid System Prototype Inter-communication Diagram LLNL ANL Client Replica Catalog LDAP Script Disk Request Manager LDAP C API or Script GIS with NWS GSI-ncftp GSI-ncftp GSI-ncftp GSI-ncftp GSI-ncftp GSI-ncftp CORBA LBNL GSI-wuftpd LBNL NCAR GSI-wuftpd SDSC GSI-pftpd HPSS HPSS Disk on Clipper HRM Disk Disk
GDMP V1.1: Caltech + EU DataGrid WP2 Tests by CALTECH, CERN, FNAL, Pisa for CMS “HLT” Production 10/2000; Integration with ENSTORE, HPSS, Castor Grid Data Management Prototype (GDMP) • Distributed Job Execution and Data Handling: • Transparency • Performance • Security • Fault Tolerance • Automation Site A Site B Submit job Replicate data Job writes data locally Replicate data • Jobs are executed locally or remotely • Data is always written locally • Data is replicated to remote sites Site C
GriPhyN: Grid Physics Network • A New Form of Integrated Distributed System • Meeting the Scientific Goals of LIGO, SDSS and the LHC Experiments • Focus on Tier2 Centers at Universities • In a Unified Hierarchical Grid of Five Levels • 18 Centers; with Four Sub-Implementations • 5 Each in US for LIGO, CMS, ATLAS; 3 for SDSS • Near Term Focus on LIGO, SDSS handling of real data; LHC “Data Challenges” with simulated data • Cooperation with PPDG, MONARC and EU DataGrid http://www.phys.ufl.edu/~avery/GriPhyN/ Data Intensive Science
GriPhyN: PetaScale Virtual Data Grids Production Team Individual Investigator Workgroups Interactive User Tools Request Planning & Request Execution & Virtual Data Tools Management Tools Scheduling Tools Resource Other Grid • Resource • Security and • Other Grid Security and Management • Management • Policy • Services Policy Services Services • Services • Services Services Transforms Distributed resources Raw data (code, storage, computers, and network) source
EU DataGridhttp://www.cern.ch/grid • Organized by CERN • HEP Participants: Czech Republic, France, Germany, Hungary, Italy, Netherlands, Portugal, UK; (US) • Industrial participation • Grid forum context • 12 Work Packages (One coordinator each) • Middleware: Work scheduling; data management; application monitoring; fabric management; storage management • Infrastructure: Testbeds and demonstrators; advanced network services • Applications: HEP, Earth Observation; Biology • [*] Basic Middleware Framework: Globus
EU DataGrid ProjectWork Packages
Emerging Data Grid User Communities • NSF Network for Earthquake Engineering Simulation (NEES) • Integrated instrumentation, collaboration, simulation • Grid Physics Network (GriPhyN) • ATLAS, CMS, LIGO, SDSS • World-wide distributed analysis of Petascale data • Access Grid; VRVS: supporting group-based collaboration • And • Genomics, Proteomics, ... • The Earth System Grid and EOSDIS • Federating Brain Data • Computed MicroTomography … • NVO, GVO
GRIDs In 2000: Summary • Grids are changing the way we do science and engineering • From Computation to Data • Key services and concepts have been identified, and development has started • Major IT challenges remain • AnOpportunity & Obligation for HEP/CSCollaboration • Transition of services and applications to production use is starting to occur • In future more sophisticated integrated services and toolsets (Inter- and IntraGrids+) could drive advances in many fields of science & engineering • HENP, facing the need for Petascale Virtual Data, is both an early adopter, and a leading developer of Data Grid technology
US-CERN BW Requirements Projection (PRELIMINARY) [#] Includes ~1.5 Gbps Each for ATLAS and CMS, Plus Babar, Run2 and Other [*] D0 and CDF at Run2: Needs Presumed to Be to be Comparable to BaBar
Daily, Weekly, Monthly and Yearly Statistics on the 45 Mbps US-CERN Link
HEP Network Requirementsand STARTAP • Beyond the requirement of adequate bandwidth, physicists in HENP’s major experiments depend on: • Network and user software that will work together to provide high throughput and to manage the bandwidth effectively • A suite of videoconference and high-level tools for remote collaboration that make data analysis from the US (and from other world regions) effective • An integrated set of local, regional, national and international networks that interoperate seamlessly, without bottlenecks
HEP Network Requirementsand STARTAP • The STARTAP, a professionally managed international peering point with an open HP policy, has been and will continue to be vital for US involvement in the LHC, and thus for the progress of the LHC physics program. • Our development of worldwide Data Grid systems, in collaboration with the European Union and other world regions, will depend on the STARTAP for joint prototyping, tests and developments using next- generation network, software and database technology. • A scalable and cost-effective growth path for the STARTAP will be needed, as a central component of international networks for HENP, and other fields. • An optical STARTAP handling OC-48 and OC-192links, with favorable peering and transit arrangements across the US would be well-matched to our future plans.
US-CERN line connection to Esnet:to HENP Labs Through STARTAP
TCP throughput performance: Caltech/CERN Via STARTAP From Caltech to CERN From CERN to Caltech
CA*net 4 Possible Architecture Optional Layer 3 aggregation service Dedicated Wavelength or SONET channel St. John’s Regina Winnipeg Charlottetown Calgary Europe Montreal Large channel WDM system Fredericton OBGP switches Halifax Seattle Ottawa Vancouver Chicago New York Toronto Pasadena Los Angeles Miami
OBGP Traffic Engineering - Physical Tier 1 ISP Tier 2 ISP Intermediate ISP Router redirects networks with heavy traffic load to optical switch, but routing policy still maintained by ISP AS 5 Optical switch looks like BGP router and AS1 is direct connected to Tier 1 ISP but still transits AS 5 Red Default Wavelength AS 4 AS 3 AS 2 AS 1 Bulk of AS 1 traffic is to Tier 1 ISP For simplicity only data forwarding paths in one direction shown Dual Connected Router to AS 5
Worldwide Computing Issues • Beyond Grid Prototype Components: Integration of Grid Prototypes for End-to-end Data Transport • Particle Physics Data Grid (PPDG) ReqM • PPDG/EU DataGrid GDMP for CMS HLT Productions • Start Building the Grid System(s): Integration with Experiment-specific software frameworks • Derivation of Strategies (MONARC Simulation System) • Data caching, query estimation, co-scheduling • Load balancing and workload management amongst Tier0/Tier1/Tier2 sites (SONN by Legrand) • Transaction robustness: simulate and verify • Transparent Interfaces for Replica Management • Deep versus shallow copies: Thresholds; tracking, monitoring and control
VRVS Remote Collaboration System: Statistics 30 Reflectors52 Countries Mbone, H.323, MPEG2 Streaming, VNC
VRVS: Mbone/H.323/QT Snapshot • VRVS Future evolution/integration (R&D) • Wider Deployment and Support of VRVS. • High Quality video and audio (MPEG1, MPEG2,..). • Shared virtual workspaces, applications, and environment • Integration of H.323 ITU Standard • Quality of Service (QoS) over the network • Improved security, authentication and confidentiality • Remote control of video cameras via a Java applet
Demonstrations (HN, J. Bunn, P. Galvez): CMSOO and VRVS • CMSOO: Java 3D Event Display IGrid2000 Yokohama, July 2000
STARTAP: Selected HENP Success Stories (1) • Onset of large scale optimized Production file transfers, involving both HENP Labs & Universities • Babar, CMS, ATLAS • Upcoming D0, CDF at FNAL/Run2; RHIC • Seamless remote access to Object databases • CMSOO demos: IGrid2000 (Yokohama) • Now starting on distributed CMS ORCA OO (TB to PB) DB Access • CMS User Analysis Environment (UAE) • Worldwide Grid-enabled view of the data, along with visualizations, data presentation and analysis • A User-view across the Data Grid
STARTAP: Selected HENP Success Stories (2) • A Principal testbed to develop production Grid systems, of worldwide scope • Grid Data Management Prototype (GDMP; US/EU) • GriPhyN: 18-20 University facilities serving CMS, ATLAS,LIGO and SDSS, • Built on a strong foundation of grid security and information infrastructure Foundation • Deploying a Grid Virtual Data Toolkit (VDT) • VRVS: Worldwide-extensible videoconferencing and shared virtual spaces • Future: Forward-looking view of Mobile Agent Coordination Architectures • Survivable Loosely Coupled Systems with Unprecedented Scalability