380 likes | 517 Views
Caltech HEP: Next Generation Networks, Grids and Collaborative Systems for Global VOs. Harvey B. Newman California Institute of Technology Cisco Visit to Caltech October 8, 2003. Large Hadron Collider (LHC) CERN, Geneva: 2007 Start . pp s =14 TeV L=10 34 cm -2 s -1
E N D
Caltech HEP: Next Generation Networks, Grids and Collaborative Systems for Global VOs Harvey B. Newman California Institute of TechnologyCisco Visit to CaltechOctober 8, 2003
Large Hadron Collider (LHC) CERN, Geneva: 2007 Start • pp s =14 TeV L=1034 cm-2 s-1 • 27 km Tunnel in Switzerland & France CMS TOTEM pp, general purpose; HI First Beams: April 2007 Physics Runs: from Summer 2007 ALICE : HI LHCb: B-physics ATLAS Design Reports: Computing Fall 2004; Physics Fall 2005
CMS: Higgs at LHC Higgs to Two Photons Higgs to Four Muons FULL CMSSIMULATION • General purpose pp detector;well-adapted to lower initial lumi • Caltech Work on Crystal ECAL for precise e and g measurements; Higgs Physics • Precise All-Silicon Tracker: 223 m2 • Excellent muon ID and precisemomentum measurements (Tracker + Standalone Muon) • Caltech Work on Forward Muon Reco. & Trigger, XDAQ for Slice Tests
LHC: Higgs Decay into 4 muons (Tracker only); 1000X LEP Data Rate 109 events/sec, selectivity: 1 in 1013 (1 person in a thousand world populations)
The CMS CollaborationIs Progressing Belgium Bulgaria NEW in US CMSFIU YALE South America:UERJ Brazil Austria USA USA Finland CERN France Germany Russia Greece Uzbekistan Hungary Ukraine Italy Slovak Republic Georgia UK Belarus Poland Turkey Armenia India Portugal Spain China Estonia Pakistan Switzerland Cyprus Korea China (Taiwan) Croatia 2000+ Physicists & Engineers 36 Countries 159 Institutions
CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1 ~PByte/sec ~100-1500 MBytes/sec Online System Experiment CERN Center PBs of Disk; Tape Robot Tier 0 +1 Tier 1 ~2.5-10 Gbps FNAL Center IN2P3 Center INFN Center RAL Center 2.5-10 Gbps Tier 2 Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center ~2.5-10 Gbps Tier 3 Institute Institute Institute Institute Tens of Petabytes by 2007-8.An Exabyte ~5-7 Years later. Physics data cache 0.1 to 10 Gbps Tier 4 Workstations LHC Data Grid Hierarchy:Developed at Caltech Emerging Vision: A Richly Structured, Global Dynamic System
Next Generation Networks and Grids for HEP Experiments • Providing rapid access to event samples and analyzed physics results drawn from massive data stores • From Petabytes in 2003, ~100 Petabytes by 2007-8, to ~1 Exabyte by ~2013-5. • Providing analyzed results with rapid turnaround, bycoordinating and managing large but LIMITED computing, data handling and NETWORKresources effectively • Enabling rapid access to the Data and the Collaboration • Across an ensemble of networks of varying capability • Advanced integrated applications, such as Data Grids, rely on seamless operation of our LANs and WANs • With reliable, monitored, quantifiable high performance Worldwide Analysis: Data explored and analyzed by thousands of globally dispersed scientists, in hundreds of teams
2001 Transatlantic Net WG Bandwidth Requirements [*] [*] See http://gate.hep.anl.gov/lprice/TAN. The 2001LHC requirements outlook now looks Very Conservative in 2003
Production BW Growth of Int’l HENP Network Links (US-CERN Example) • Rate of Progress >> Moore’s Law. (US-CERN Example) • 9.6 kbps Analog (1985) • 64-256 kbps Digital (1989 - 1994) [X 7 – 27] • 1.5 Mbps Shared (1990-3; IBM) [X 160] • 2 -4 Mbps (1996-1998) [X 200-400] • 12-20 Mbps (1999-2000) [X 1.2k-2k] • 155-310 Mbps (2001-2) [X 16k – 32k] • 622 Mbps (2002-3) [X 65k] • 2.5 Gbps (2003-4) [X 250k] • 10 Gbps (2005) [X 1M] • A factor of ~1M over a period of 1985-2005 (a factor of ~5k during 1995-2005) • HENP has become a leading applications driver, and also a co-developer of global networks
HEP is Learning How to Use Gbps Networks Fully: Factor of 25-100 Gain in Max. Sustained TCP Thruput in 15 Months, On Some US+TransAtlantic Routes • 9/01 105 Mbps 30 Streams: SLAC-IN2P3; 102 Mbps 1 Stream CIT-CERN • 1/09/02 190 Mbps for One stream shared on Two 155 Mbps links • 5/20/02 450-600 Mbps SLAC-Manchester on OC12 with ~100 Streams • 6/1/02 290 Mbps Chicago-CERN One Stream on OC12 (mod. Kernel) • 9/02 850, 1350, 1900 Mbps Chicago-CERN 1,2,3 GbE Streams, 2.5G Link • 11/02 [LSR] 930 Mbps in 1 Stream California-CERN, and California-AMS FAST TCP 9.4 Gbps in 10 Flows California-Chicago • 2/03 [LSR] 2.38 Gbps in 1 Stream California-Geneva (99% Link Utilization) • 5/03 [LSR] 0.94 Gbps IPv6 in 1 Stream Chicago- Geneva • Fall 2003 Goal: 6-10 Gbps in 1 Stream over 7-10,000 km (10G Link); LSRs *
FAST TCP:Baltimore/Sunnyvale • RTT estimation: fine-grain timer • Delay monitoring in equilibrium • Pacing: reducing burstiness • Fast convergence to equilibrium 88% 10G 90% 9G Measurements 11/03 • Std Packet Size • Utilization averaged over > 1hr • 4000 km Path 90% Average utilization 92% 8.6 Gbps; 21.6 TB in 6 Hours 95% Fair SharingFast Recovery 1 flow 2 flows 7 flows 9 flows 10 flows
10GigE Data Transfer: Internet2 LSR On Feb. 27-28, a Terabyte of data was transferred in 3700 seconds by S. Ravot of Caltech between the Level3 PoP in Sunnyvale near SLAC and CERN through the TeraGrid router at StarLight from memory to memoryAs a single TCP/IP stream at average rate of 2.38 Gbps. (Using large windows and 9kB “Jumbo frames”)This beat the former record by a factor of ~2.5, and used the US-CERN link at 99% efficiency. European Commission 10GigE NIC
“Private” Grids”: Structured P2PSub-Communities in Global HEP
HENP Major Links: Bandwidth Roadmap (Scenario) in Gbps Continuing the Trend: ~1000 Times Bandwidth Growth Per Decade;We are Rapidly Learning to Use Multi-Gbps Networks Dynamically
HENP Lambda Grids:Fibers for Physics • Problem: Extract “Small” Data Subsets of 1 to 100 Terabytes from 1 to 1000 Petabyte Data Stores • Survivability of the HENP Global Grid System, with hundreds of such transactions per day (circa 2007)requires that each transaction be completed in a relatively short time. • Example: Take 800 secs to complete the transaction. Then Transaction Size (TB)Net Throughput (Gbps) 1 10 10 100 100 1000 (Capacity of Fiber Today) • Summary: Providing Switching of 10 Gbps wavelengthswithin ~3-5 years; and Terabit Switching within 5-8 yearswould enable “Petascale Grids with Terabyte transactions”,to fully realize the discovery potential of major HENP programs, as well as other data-intensive fields.
The Move to OGSA and then Managed Integration Systems App-specific Services ~Integrated Systems Stateful; Managed Open Grid Services Arch Web services + … Increased functionality, standardization GGF: OGSI, … (+ OASIS, W3C) Multiple implementations, including Globus Toolkit Globus Toolkit X.509, LDAP, FTP, … Defacto standards GGF: GridFTP, GSI Custom solutions Time
Lookup Discovery Service Lookup Service Service Listener Lookup Service Remote Notification Registration Station Server Station Server Station Server Proxy Exchange Dynamic Distributed Services Architecture (DDSA) • “Station Server” Services-engines at sites host “Dynamic Services” • Auto-discovering, Collaborative • Servers interconnect dynamically; form a robust fabric in which mobile agents travel, with a payload of (analysis) tasks • Service Agents: Goal-Oriented, Autonomous, Adaptive • Maintain State: Automatic“Event” notification • Adaptable to Web services: OGSA; many platforms & working environments (also mobile) See http://monalisa.cacr.caltech.edu http://diamonds.cacr.caltech.edu Caltech/UPB (Romania)/NUST (Pakistan) Collaboration
MonaLisa: A Globally Scalable Grid Monitoring System • By I. Legrand (Caltech) et al. • Monitors Clusters, Networks • Agent-based Dynamic information / resource discovery mechanisms • Implemented in • Java/Jini; SNMP • WDSL / SOAP with UDDI • Global System Optimizations • > 50 Sites and Growing • Being deployed in Abilene; through the Internet2 E2EPi • MonALISA (Java) 3D Interface
SEA POR SAC NYC CHI OGD DEN SVL CLE PIT WDC FRE KAN RAL NAS National Lambda Rail STR PHO LAX WAL ATL SDG OLG DAL JAC UltraLight Collaboration:http://ultralight.caltech.edu • Caltech, UF, FIU, UMich, SLAC,FNAL,MIT/Haystack,CERN, UERJ(Rio), NLR, CENIC, UCAID,Translight, UKLight, Netherlight, UvA, UCLondon, KEK, Taiwan • Cisco, Level(3) • First Integrated packet switched and circuit switched hybrid experimental research network; leveraging transoceanic R&D network partnerships • NLR Wave: 10 GbE (LAN-PHY) wave across the US; (G)MPLS managed • Optical paths transatlantic; extensions to Japan, Taiwan, Brazil • End-to-end monitoring; Realtime tracking and optimization; Dynamic bandwidth provisioning, • Agent-based services spanning all layers of the system, from the optical cross-connects to the applications.
Grid Analysis Environment:R&D Led by Caltech HEP • Building a GAE is the “Acid Test” for Grids; and iscrucial for LHC experiments • Large, Diverse, Distributed Community of users • Support for hundreds to thousands of analysis tasks, shared among dozens of sites • Widely varying task requirements and priorities • Need for Priority Schemes, robust authentication and Security • Operation in a severely resource-limited and policy- constrained global system • Dominated by collaboration policy and strategy,for resource-usage and priorities • GAE is where the physics gets done • Where physicists learn to collaborate on analysis,across the country, and across world-regions
Grid Enabled Analysis: User View of a Collaborative Desktop Physics analysis requires varying levels of interactivity, from “instantaneous response” to “background” to “batch mode” Requires adapting the classical Grid “batch-oriented” view to a services-oriented view, with tasks monitored and tracked Use Web Services, leveraging wide availability of commodity tools and protocols: adaptable to a variety of platforms Implement the Clarens Web Services layer as mediator between authenticated clients and services as part of CAIGEE architecture Clarens presents a consistent analysis environment to users, based on WSDL/SOAP or XML RPCs, with PKI-based authentication for Security External Services Storage Resource Broker CMS ORCA/COBRA Browser MonaLisa Iguana ROOT Cluster Schedulers PDA ATLAS DIAL Griphyn VDT Clarens VO Management File Access MonaLisa Monitoring Authentication Key Escrow Shell Authorization Logging
KEK (JP) VRVS (Version 3) Meeting in 8 Time Zones VRVS on Windows Caltech (US) RAL (UK) Brazil CERN (CH) AMPATH (US) Pakistan SLAC (US) Canada 73 Reflectors Deployed Worldwide Users in 83 Countries AMPATH (US)
Caltech HEP Group CONCLUSIONS Caltech has been a leading inventor/developer of systems for Global VOs, spanning multiple technology generations • International Wide Area Networks Since 1982; Global role from 2000 • Collaborative Systems (VRVS) Since 1994 • Distributed Databases since 1996 • The Data Grid Hierarchy and Dynamic Distributed Systems Since 1999 • Work on Advanced Network Protocols from 2000 • A Focus on the Grid-enabled Analysis Environment for Data Intensive Science Since 2001 • Strong HEP/CACR/CS-EE Partnership [Bunn, Low] Driven by the Search for New Physics at the TeV Energy Scale at the LHC • Unprecedented Challenges in Access, Processing, and Analysis of Petabyte to Exabyte Data; and Policy-Driven Global Resource Sharing Broad Applicability Within and Beyond Science: Managed, Global Systems for Data Intensive and/or Realtime Applications Cisco Site Team: Many Apparent Synergies with Caltech Team: Areas of Interest, Technical Goals and Development Directions
U.S. CMS is Progressing:400+ Members, 38 Institutions Caltech has Led the US CMS Collaboration Board Since 1998; 3rd Term as Chair Through 2004 + New in 2002/3: FIU, Yale
Physics Potential of CMS:We Need to Be Ready on Day 1 At L0=2x1033 cm-2s-1 • 1 day ~ 60 pb-1 • 1 month ~ 2 fb-1 • 1 year ~ 20 fb-1 1 year 3 months LHCC: CMS detector is well optimized for LHC physics. To fully exploit the physics potential of the LHC for discovery we will start with a “COMPLETE”* CMS detector. In particular a complete ECAL from the beginning for the low mass Hgg channel. MH = 130 GeV
Caltech Role: Precision e/g Physics With CMS H 0ggIn the CMS Precision ECAL • Crystal Quality in Mass Production • Precision Laser Monitoring • Study of Calibration Physics Channels • Inclusive J,U, W, Z • Realistic H 0gg Background Studies: 2.5 M Events • Signal/Bgd Optimization:g/Jet Separation • Vertex Reconstruction with Associated Tracks • Photon Reconstruction: Pixels + ECAL + Tracker • Optimization of Tracker Layout • Higher Level Trig. On Isolated g • ECAL Design: Crystal Sizes Cost- Optimized for g/Jet Separation
CMS SUSY Reach • The LHC could establish the existence of SUSY; study the masses and decays of SUSY particles • The cosmologically interesting region of the SUSY space could be covered in the first weeksof LHC running. • The 1.5 to 2 TeV mass range for squarks and gluinos could be covered within one year at low luminosity.
HCAL Barrels Done: Installing HCAL Endcap and Muon CSCs in SX5 36 Muon CSCs successfully installed on YE-2,3. Avg. rate 6/day (planned 4/day). Cabling+commissioning. HE-1 complete, HE+ will be mounted in Q4 2003
UltraLight: Proposed to the NSF/EIN Program http://ultralight.caltech.edu • First “Hybrid” packet-switched and circuit-switched optical network • Trans-US wavelength riding on NLR: LA-SNV-CHI-JAX • Leveraging advanced research & production networks • USLIC/DataTAG, SURFnet/NLlight, UKLight, Abilene, CA*net4 • Dark fiber to CIT, SLAC, FNAL, UMich; Florida Light Rail • Intercont’l extensions: Rio de Janeiro, Tokyo, Taiwan • Three Flagship Applications • HENP: TByte to PByte “block” data transfers at 1-10+ Gbps • eVLBI: Real time data streams at 1 to several Gbps • Radiation Oncology: GByte image “bursts” delivered in ~1 second • A traffic mix presenting a variety of network challenges
UltraLight: An Ultra-scale Optical Network Laboratory for Next Generation Science http://ultralight.caltech.edu • Ultrascale protocols and MPLS: Classes of service used to share primary 10G efficiently • Scheduled or sudden “overflow” demands handled by provisioning additional wavelengths: • GE, N*GE, and eventually 10 GE • Use path diversity, e.g. across the Atlantic, Canada • Move to multiple 10G ’s (leveraged) by 2005-6 • Unique feature: agent-based, end-to-end monitored, dynamically provisioned mode of operation • Agent services span all layers of the system; Communication application characteristics and requirements to • The protocol stacks, MPLS class provisioning and the optical cross-connects • Dynamic responses help manage traffic flow
History – One large Research Site Much of the Traffic:SLAC IN2P3/RAL/INFN;via ESnet+France;Abilene+CERN Current Traffic to ~400 Mbps;Projections: 0.5 to 24 Tbps by ~2012
VRVS Web User Interface SIP ? Mbone Tools H.323 MPEG QuickTime 4.0 & 5.0 Collaborative Applications VRVS Reflectors (Unicast/Multicast) H.320 QoS Real Time Protocol (RTP/RTCP) Network Layer (TCP/UDP/IP) VRVS Core Architecture • VRVS combined the best of all standards and products in one unique architecture • Multi-platform and multi-protocol architecture
MONARC/SONN: 3 Regional Centres Learning to Export Jobs (Day 9) <E> = 0.83 <E> = 0.73 1MB/s ; 150 ms RTT CERN30 CPUs CALTECH 25 CPUs 1.2 MB/s 150 ms RTT 0.8 MB/s 200 ms RTT NUST 20 CPUs <E> = 0.66 Simulations for Strategy and System Services Development Building the LHC Computing Model:Focus on New Persistency Day = 9 I. Legrand, F. van Lingen
GAE Collaboration DesktopExample Four-screen Analysis Desktop 4 Flat Panels: 5120 X 1024 Driven by a single server and single graphics card Allows simultaneous work on: • Traditional analysis tools (e.g. ROOT) • Software development • Event displays (e.g. IGUANA) • MonALISA monitoring displays; Other “Grid Views” • Job-progress Views • Persistent collaboration (e.g. VRVS; shared windows) • Online event or detector monitoring • Web browsing, email
GAE Components & Services VO authorization/management Software Install/Config. Tools Virtual Data System Data Service Catalog (Metadata) Replica Management Service Data Mover/Delivery Service [NEW] Planners (Abstract; Concrete) Job Execution Service Data Collection Services – couples analysis selections/expressions to datasets/replicas Estimators Events; Strategic Error Handling; Adaptive Optimization Grid-Based Analysis Task’s Life: Authentication DATA SELECTION Query/Dataset Selection/?? Session Start Establish Slave/server config. Data Placement Resource Broker for resource assignment Or static configuration Availability/Cost Estimation Launch masters/slaves/Grid Execution services ESTABLISH TASK – Initiate & Software Specification/Install Execute (with dynamic Job Control) Report Status (Logging/Metadata/partial results) Task Completion (Cleanup, data merge/archive/catalog) Task End Task Save LOOP to ESTABLISH TASK LOOP to DATA SELECTION GAE Workshop: Components and Services; GAE Task Lifecycle
Grid Enabled Analysis Architecture Michael ThomasJuly, 2003
HENP Networks and Grids; UltraLight • The network backbones and major links used by major HENP projects advanced rapidly in 2001-2 • To the 2.5-10 G range in 15 months; much faster than Moore’s Law • Continuing a trend: a factor ~1000 improvement per decade • Network costs continue to fall rapidly • Transition to a community-owned and operated infrastructure for research and education is beginning with (NLR, USAWaves) • HENP (Caltech/DataTAG/SLAC/LANL Team) is learning to use 1-10 Gbps networks effectively over long distances • Unique Fall Demos: to 10 Gbps flows over 10k km • A new HENP and DOE Roadmap: Gbps to Tbps links in ~10 Years • UltraLight: A hybrid packet-switched and circuit-switched network: ultrascale protocols, MPLS and dynamic provisioning • Sharing, augmenting NLR and internat’l optical infrastructures • May be a cost-effective model for future HENP, DOE networks