1 / 22

Networks & Grids for High Energy Physics: Enabling Rapid Data Access and Collaboration

This article discusses the goals and needs of networks and grids in high energy physics, including providing rapid access to physics results, coordinating limited resources effectively, and enabling collaboration across networks. It also highlights the increasing bandwidth requirements and the progress made in achieving high sustained TCP throughput on transatlantic and US links. The article concludes by emphasizing the importance of scalable and robust grid solutions for handling the large-scale production of data in data-intensive fields.

laughlin
Download Presentation

Networks & Grids for High Energy Physics: Enabling Rapid Data Access and Collaboration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Energy Physics: Networks& Grids Systems for Global Science Harvey B. Newman California Institute of TechnologyAMPATH Workshop, FIUJanuary 31, 2003

  2. Next Generation Networks for Experiments: Goals and Needs • Providing rapid access to event samples, subsets and analyzed physics results from massive data stores • From Petabytes by 2002, ~100 Petabytes by 2007, to ~1 Exabyte by ~2012. • Providing analyzed results with rapid turnaround, bycoordinating and managing the large but LIMITED computing, data handling and NETWORK resources effectively • Enabling rapid access to the data and the collaboration • Across an ensemble of networks of varying capability • Advanced integrated applications, such as Data Grids, rely on seamless operation of our LANs and WANs • With reliable, monitored, quantifiable high performance Large data samples explored and analyzed by thousands of globally dispersed scientists, in hundreds of teams

  3. LHC: Higgs Decay into 4 muons (Tracker only); 1000X LEP Data Rate 109 events/sec, selectivity: 1 in 1013 (1 person in 1000 world populations)

  4. Transatlantic Net WG (HN, L. Price) Bandwidth Requirements [*] [*] BW Requirements Increasing Faster Than Moore’s Law See http://gate.hep.anl.gov/lprice/TAN

  5. HENP Major Links: Bandwidth Roadmap (Scenario) in Gbps Continuing the Trend: ~1000 Times Bandwidth Growth Per Decade;We are Rapidly Learning to Use and Share Multi-Gbps Networks

  6. UK SuperJANET4 NL Atrium VTHD SURFnet GEANT It GARR-B Fr INRIA DataTAG Project NewYork ABILENE STARLIGHT ESNET 2.5 to 10G GENEVA Wave Triangle 10G 10G CALREN2 STAR-TAP • EU-Solicited Project. CERN, PPARC (UK), Amsterdam (NL), and INFN (IT);and US (DOE/NSF: UIC, NWU and Caltech) partners • Main Aims: • Ensure maximum interoperability between US and EU Grid Projects • Transatlantic Testbed for advanced network research • 2.5 Gbps Wavelength Triangle from 7/02; to 10 Gbps Triangle by Early 2003

  7. Progress: Max. Sustained TCP Thruput on Transatlantic and US Links • 8-9/01 105 Mbps 30 Streams: SLAC-IN2P3; 102 Mbps 1 Stream CIT-CERN • 11/5/01 125 Mbps in One Stream (modified kernel): CIT-CERN • 1/09/02 190 Mbps for One stream shared on 2 155 Mbps links • 3/11/02 120 Mbps Disk-to-Disk with One Stream on 155 Mbps link (Chicago-CERN) • 5/20/02 450-600 Mbps SLAC-Manchester on OC12 with ~100 Streams • 6/1/02 290 Mbps Chicago-CERN One Stream on OC12 (mod. Kernel) • 9/02 850, 1350, 1900 Mbps Chicago-CERN 1,2,3 GbE Streams, OC48 Link • 11-12/02 FAST: 940 Mbps in 1 Stream SNV-CERN; 9.4 Gbps in 10 Flows SNV-Chicago * Also see http://www-iepm.slac.stanford.edu/monitoring/bulk/; and the Internet2 E2E Initiative: http://www.internet2.edu/e2e

  8. Rf (s) TCP AQM p Rb’(s) Theory Internet: distributed feedback system Experiment Geneva 7000km Sunnyvale Baltimore 3000km 1000km Chicago FAST (Caltech): A Scalable, “Fair” Protocol for Next-Generation Networks: from 0.1 To 100 Gbps SC2002 11/02 Highlights of FAST TCP • Standard Packet Size • 940 Mbps single flow/GE card • 9.4 petabit-m/sec • 1.9 times LSR • 9.4 Gbps with 10 flows • 37.0 petabit-m/sec • 6.9 times LSR • 22 TB in 6 hours; in 10 flows • Implementation • Sender-side (only) mods • Delay (RTT) based • Stabilized Vegas Sunnyvale-Geneva Baltimore-Geneva Baltimore-Sunnyvale SC2002 10 flows SC2002 2 flows I2 LSR 29.3.00 multiple SC2002 1 flow 9.4.02 1 flow 22.8.02 IPv6 URL: netlab.caltech.edu/FAST Next: 10GbE; 1 GB/sec disk to disk C. Jin, D. Wei, S. Low FAST Team & Partners

  9. FAST TCP: Aggregate Throughput 88% FAST • Standard MTU • Utilization averaged over > 1hr 90% 90% Average utilization 92% 95% 1 flow 2 flows 7 flows 9 flows 10 flows netlab.caltech.edu

  10. HENP Lambda Grids:Fibers for Physics • Problem: Extract “Small” Data Subsets of 1 to 100 Terabytes from 1 to 1000 Petabyte Data Stores • Survivability of the HENP Global Grid System, with hundreds of such transactions per day (circa 2007)requires that each transaction be completed in a relatively short time. • Example: Take 800 secs to complete the transaction. Then Transaction Size (TB)Net Throughput (Gbps) 1 10 10 100 100 1000 (Capacity of Fiber Today) • Summary: Providing Switching of 10 Gbps wavelengthswithin ~3-5 years; and Terabit Switching within 5-8 yearswould enable “Petascale Grids with Terabyte transactions”,as required to fully realize the discovery potential of major HENP programs, as well as other data-intensive fields.

  11. Data Intensive Grids Now: Large Scale Production • Efficient sharing of distributed heterogeneous compute and storage resources • Virtual Organizations and Institutional resource sharing • Dynamic reallocation of resources to target specific problems • Collaboration-wide data access and analysis environments • Grid solutions NEED to be scalable & robust • Must handle many petabytes per year • Tens of thousands of CPUs • Tens of thousands of jobs • Grid solutions presented here are supported in part by the GriPhyN, iVDGL, PPDG, EDG, and DataTag • We are learning a lot from these current efforts • For Example 1M Events Processed using VDT Oct.-Dec. 2002 Grids NOW

  12. Beyond Peoduction: Web Services for Ubiquitous Data Access and Analysis by a Worldwide Scientific Community • Web Services: easy, flexible, platform-independent access to data (Object Collections in Databases) • Well-adapted to use by individual physicists, teachers & students • SkyQuery ExampleJHU/FNAL/Caltech with Web Service based access to astronomy surveys • Can be individually or simultaneously queried via Web interface • Simplicityof interface hides considerableserver power (from stored procedures etc.) • This is a “Traditional” Web Service, with no user authentication required

  13. COJAC: CMS ORCA Java Analysis Component: Java3D Objectivity JNI Web Services Demonstrated Caltech-Riode Janeiro and Chile in 2002

  14. LHC Distributed CM: HENP Data Grids Versus Classical Grids • Grid projects have been a step forward for HEP and LHC: a path to meet the “LHC Computing” challenges • “Virtual Data” Concept: applied to large-scale automated data processing among worldwide-distributed regional centers • The original Computational and Data Grid concepts are largely stateless, open systems: known to be scalable • Analogous to the Web • The classical Grid architecture has a number of implicit assumptions • The ability to locate and schedule suitable resources, within a tolerably short time (i.e. resource richness) • Short transactions; Relatively simple failure modes • HEP Grids are data-intensive and resource constrained • Long transactions; some long queues • Schedule conflicts; policy decisions; task redirection • A Lot of global system state to be monitored+tracked

  15. Current Grid Challenges: SecureWorkflow Management and Optimization • Maintaining a Global View of Resources and System State • Coherent end-to-end System Monitoring • Adaptive Learning: new algorithms and strategiesfor execution optimization (increasingly automated) • Workflow: Strategic Balance of Policy Versus Moment-to-moment Capability to Complete Tasks • Balance High Levels of Usage of Limited Resources Against Better Turnaround Times for Priority Jobs • Goal-Oriented Algorithms; Steering Requests According to (Yet to be Developed) Metrics • Handling User-Grid Interactions: Guidelines; Agents • Building Higher Level Services, and an IntegratedScalable User Environment for the Above

  16. Lookup Discovery Service Lookup Service Service Listener Lookup Service Remote Notification Registration Station Server Station Server Station Server Proxy Exchange Distributed System Services Architecture (DSSA): CIT/Romania/Pakistan • Agents: Autonomous, Auto-discovering, self-organizing, collaborative • “Station Servers” (static) host mobile “Dynamic Services” • Servers interconnect dynamically; form a robust fabric in which mobile agents travel, with a payload of (analysis) tasks • Adaptable to Web services: OGSA; and many platforms • Adaptable to Ubiquitous, mobile working environments Managing Global Systems of Increasing Scope and Complexity, In the Service of Science and Society, Requires A New Generation of Scalable, Autonomous, Artificially Intelligent Software Systems

  17. MonaLisa: A Globally Scalable Grid Monitoring System By I. Legrand (Caltech) • Deployed on US CMS Grid • Agent-based Dynamic information / resource discovery mechanism • Talks w/Other Mon. Systems • Implemented in • Java/Jini; SNMP • WDSL / SOAP with UDDI • Part of a Global “Grid Control Room” Service

  18. MONARC SONN: 3 Regional Centres “Learning” to Export Jobs (Day 9) <E> = 0.83 <E> = 0.73 1MB/s ; 150 ms RTT CERN30 CPUs CALTECH 25 CPUs 1.2 MB/s 150 ms RTT 0.8 MB/s 200 ms RTT NUST 20 CPUs <E> = 0.66 Optimized: Day = 9

  19. NSF ITR: Globally EnabledAnalysis Communities • Develop and build Dynamic Workspaces • Build Private Grids to support scientific analysis communities • Using Agent Based Peer-to-peer Web Services • Construct Autonomous Communities Operating Within Global Collaborations • Empower small groups of scientists (Teachers and Students) to profit from and contribute to int’l big science • Drive the democratization of science via the deployment of new technologies

  20. Private Grids and P2P Sub-Communities in Global CMS

  21. 14600 Host Devices; 7800 Registered Users in 64 Countries 45 Network Servers Annual Growth 2 to 3X

  22. An Inter-Regional Center for Research, Education and Outreach, and CMS CyberInfrastucture Foster FIU and Brazil (UERJ) Strategic Expansion Into CMS Physics Through Grid-Based “Computing” • Development and Operation for Science of International Networks, Grids and Collaborative Systems • Focus on Research at the High Energy frontier • Developing a Scalable Grid-Enabled Analysis Environment • Broadly Applicable to Science and Education • Made Accessible Through the Use of Agent-Based (AI) Autonomous Systems; and Web Services • Serving Under-Represented Communities • At FIU and in South America • Training and Participation in the Development of State of the Art Technologies • Developing the Teachers and Trainers • Relevance to Science, Education and Society at Large • Develop the Future Science and Info. S&E Workforce • Closing the Digital Divide

More Related