840 likes | 983 Views
Cyberinfrastructure. March 12, 2008 NSF Reverse Site Visit D R A F T. CI Outline. Achievements Optical Networking Infrastructure Tier-3 Facility & FIU Participation in Grid Initiatives Deployment of Tools Proposed Activities Enhancing the Inter-Regional Network Infrastructure
E N D
Cyberinfrastructure March 12, 2008 NSF Reverse Site Visit D R A F T
CI Outline • Achievements • Optical Networking Infrastructure • Tier-3 Facility & FIU Participation in Grid Initiatives • Deployment of Tools • Proposed Activities • Enhancing the Inter-Regional Network Infrastructure • Deploying and Integrating Cyberinfrastructure Tools • Enhancing CHEPREO’s Tier3 Computing Facility • Proposed Milestones • Cyberinfrastructure Support Personnel
ACHIEVEMENT Optical Networking Infrastructure • Active Equipment • 2 Cisco ONS 15454 optical muxes are in full production operation • Located at AMPATH in Miami and ANSP POP in Sao Paulo • Leveraged by the NSF WHREN-LILA project that support international connection from U.S. to South America
ACHIEVEMENT Current Active Equipment • CHEPREO optical muxes terminate international link • Supports traffic flows to and from Brazil’s Tier2s: SPRACE (Sao Paulo) and HEPGrid (Rio) NLR SPRACE HEPGRID
ACHIEVEMENT International Network Connection • WHREN-LILA project • Support for Brazil’s distributed Tier-2 • Bandwidth Utilization
BACKGROUND WHREN-LILA IRNC Award #0441095 • 5-year NSF Cooperative Agreement • Connectivity to Brazil is supported through a coalition effort through the WHREN-LILA projects • Florida International University (IRNC awardee) • Corporation for Education Network Initiatives in California (CENIC) • Project support from the Academic Network of Sao Paulo (award #2003/13708-0) • CLARA, Latin America • CUDI, Mexico • RNP, Brazil • REUNA, Chile • Links Interconnecting Latin America (LILA) • Improve connectivity in the Americas • Western-Hemisphere Research and Education Networks (WHREN) • Coordinating body • Leverage participants’ network resources • Enable collaborative research and advance education
BACKGROUND WHREN-LILA Connections • 2.5Gbps circuit + dark fiber segment • U.S. landings in Miami and San Diego • Latin America landing in Sao Paulo and Tijuana
AtlanticWave • AtlanticWave is provisioning a 10GigE wave to support a distributed international exchange and peering fabric along the Atlantic coast of North and South America, following the GLIF GOLE model • AtlanticWave will connect the key exchange points on the U.S. East Coast: • International Exchange Points MANLAN in NYC and AMPATH in Miami • MAX gigapop and NGIX-East in Washington, DC • SoX gigapop in Atlanta • A-Wave is an integral component of the NSF IRNC WHREN-LILA proposal to create an open distributed exchange and transport service along the Atlantic rim • A-Wave partners include SURA, FIU-AMPATH, IEEAF, FLR, MAX, SLR/SoX, Internet2/MANLAN, and the Academic Network of Sao Paulo (ANSP) • HEP will be using this infrastructure (more in the proposed section)
Western-Hemisphere International Exchange Points • Collaboration with TransLight and CANARIE to extend connectivity to StarLight and PacificWave • International Exchange Points at Sao Paulo, Miami, Washington DC, NYC, Chicago, Seattle, LA • Exchange and Peering capabilities with national and international networks
CHEPREO-LILA network infrastructure • CHEPREO funding in 2006 made possible LILA link capacity upgrade to 2.5Gbps in 2006 • Brazil was able to participate in the CMS Tier 2 Milestones Plan • CMS Tier2s demonstrated data transfers to Tier 1s using 50% of network link capacity • LILA infrastructure configured to support U.S.-Brazil HEP bandwidth requirements
Network Infrastructure Investments to benefit CHEPREO NSF network infrastructure investments in CHEPREO and WHREN have influenced international investments Result has been a 10-to-1 return on investment for NSF dollars Continued NSF funding for CHEPREO and WHREN will sustain and fortify Brazil’s participation as a member of the U.S. CMS collaboration Brazil is planning to invest in a network bandwidth capacity upgrade to the U.S., combined with WHREN, up to 10Gbps Brazil investment will leverage NSF investment in network infrastructure by 100% that will directly benefit CHEPREO $ 2.9M NSF IRNC / CHEPREO $11.4M EU ALICE $ 3.6M CLARA $ 2.8M Brazil $ 2.3M Other LA countries
GridUNESP São Paulo State University: 24 campuses
GridUNESP Central Cluster ~60% for CMS = Tier 1 7 X Secondary Clusters
Current Configuration 244 cores/ 2.00 GHz each (dual CPU dual core) 112kSI2K (SpecInt 2000) 44,5 TB of Storage
Configuration for 2009 (year end) 780 cores / 2.4 GHz each (dual CPU quad core) 420kSI2K (SpecInt 2000) 200 TB of Storage
Brazil’s Distributed Tier 2 Facility – 1 GB Infrastructure
ACHIEVEMENT FIU Tier-3at the NAP of the Americas
ACHIEVEMENT FIU Tier-3 Center • Tier-3 Centers in the CMS computing model • Primarily employed in support of local CMS physics community • But some also participate in CMS production activities • Hardware & manpower requirement non-trivial • Requires sufficient hardware to make worthwhile • Grid enabled gatekeeper and other grid services are also required • The FIU Tier-3 is deployed to: • Provides services and resources for local physicists • CMS analysis, & computing • Education Research and Outreach Group • Cyberinfrastructure: serves as a “reference implementation” of an operational grid-enabled resource for the FIU CI community at large • FIU Grid community • CHEPREO Computer Science groups
ACHIEVEMENT GATE KEEPER FIU Tier-3 at A Glance • ROCKS based meta-cluster consists of: • Grid-enabled computing cluster • Approx: 20 Dual Xeon boxes • Service nodes • User login node, frontend, webservers, frontier/squid server, development… • Local CMS interactive analysis • Purchased with FIU startup funds • A single 8 core server with 16GB RAM • A large 3ware based 16 TB fileserver • A Computing Element site on the OSG production Grid • A production CE even before the OSG (Grid3) • Supports all of the OSG VOs • Maintained with latest vers of OSG software cache
ACHIEVEMENT FIU Tier-3 Usage at Glance • Current usage: Since Nov. 2007 • About 40K hrs logged through OSG Grid gatekeeper • CMS, LIGO, nanhub, OSGedu… VOs have used the site through this period • About 85% of that was utilized by cmsProd during CSA07 • We generated about 60K events out of the 60M world wide effort • We were one of 2 or 3 Tier-3s that participated in CSA07 world wide effort !
ACHIEVEMENT Tier3 Team at FIU • CHEPREO/CIARA funded positions • Micha Niskin: Lead systems administrator • Ernesto Rubi: Networking • CHEPREO undergraduate fellows: Working on deploying CMS and storage services • Ramona Valenzula • David Puldon • Ricardo Leante
Proposed Activities • Enhancing the Inter-Regional Network Infrastructure • Deploying and Integrating Cyberinfrastructure Tools • Enhancing CHEPREO’s Tier3 Computing Facility • Proposed Milestones • Cyberinfrastructure Support Personnel
Enhancing the Inter-Regional Network Infrastructure • Challenges and Issues • Limitations of the Cisco ONS 15454 network equipment • CHEPREO’s bandwidth requirements of the international network connection are growing • Opportunities • Proposed Solution
Network Equipment Limitations • Loss of access to international bandwidth • Inefficient mapping of Ethernet to SDH circuits • Mapping GigE to Packet-Over-SONET (POS) ports results in some bandwidth unavailable • No mechanism to provision and manage POS circuits • Circuits must be pre-provisioned by the carrier • This is a significant problem, because of cost of current bandwidth to Sao Paulo is ~ $147,000 per month • Significant efforts by network engineers expended to find workaround solutions
Bandwidth Estimates 2008 • CHEPREO’s bandwidth requirements are growing • Upgrade of metro in Sao Paulo and Rio to 10G will enable multiple GigE connections • Consider the following scenario for SPRACE by end of 2008 • Typical Tier2 Tier1 data transfers estimated at 5 TB/day downwards (T1 -> T2) and 1 TB/day upwards (T2 -> T1) (CMS Computing Project - Technical Design Report) • 60 MB/s downwards and 12 MB/s upwards, or 480 Mbps + 100 Mbps ~= 600 Mbps for SPRACE Fermilab transfers on a daily basis by the end of 2008 • If 2TB/day (small) data set transfers, 3 simultaneous T2T1 transfers, then we will need ~200 Mbps rate per data set = ~600 Mbps • Consider DZero reprocessing, will need to sustain data transfers on the order of 20 MB/s = 160 Mbps • SPRACE will need ~ 800 Mbps by end of 2008. Estimate same number for HEPgrid; ie., bandwidth projection estimated at 1.6 Gb/s
Bandwidth Estimates 2009 • Consider the following scenario for SPRACE and HEPgrid for 2009, as LHC evolves towards full luminosity. • Typical Tier2 Tier1 data transfers will certainly increase • SPRACE and HEPgrid each connected on 10G ports • If 10 TB/day downwards and 2 TB/day upwards, • then 1.2 Gb/s will be typical transfer rates between single Brazil T2 and Fermilab; 2.4 Gb/s for both T2s • If 4 TB/day data set transfers, with 5 simultaneous T2T1 transfers, • then 50MB/s * 5 = 2 Gb/s • SPRACE and HEPgrid will each need 2 Gb/s by end of 2009, for a total estimate of 4 Gb/s
Networking Opportunities • ANSP and RNP working with WHREN to increase bandwidth • ANSP will allocate a 1.2Gbps circuit from Sao Paulo to Santiago to Los Angeles by est. August 1, 2008. • Half the capacity will be dedicated to research; entire circuit will be available for bursting • RNP is seeking approval for a 10Gbps upgrade to Miami-Sao Paulo LILA-East link • Brazil’s invest is the result of the investments the NSF and FAPESP have made in WHREN and CHEPREO. • Brazil’s success in upgrading the link is unlikely without NSF investments, and technically impossible without the proposed CHEPREO equipment.
12-to-1 ROI for the NSF Projected 5-year total investment (in millions): NSF IRNC $ 5.6 EU ALICE $19.4 CLARA $ 9.0 Brazil-ANSP $13.6 Brazil-RNP $ 6.6 Other Latin American countries $ 5.8 Total: $60.0 • CHEPREO and the WHREN coalition have defined a roadmap for the LILA East to continue to grow to support the U.S.-Latin America science research and education communities
Network Infrastructure Investments to Benefit CHEPREO • In collaboration with WHREN, RNP and ANSP of Brazil are planning to invest in bandwidth capacity upgrades to the U.S. • Provides RedCLARA with additional shared network infrastructure to reach the U.S. and Europe • Santiago-Sao Paulo 155Mbps shared network infrastructure for astronomy and Latin America through RedCLARA • Success would leverage NSF investment in network infrastructure by 100% that would benefit CHEPREO
Networking Proposed Solution • Upgrade optical equipment in Miami and Sao Paulo • Integrate this equipment into the production and research/experimental milestones for CHEPREO Physics, and National and International Education Outreach • Leverage western-hemisphere infrastructure of IRNC links and AtlanticWave to establish lightpaths between end-points (Brazil-CERN, Brazil-FNAL, Brazil-Caltech)
Western-Hemisphere Network Infrastructure • 10GigE wave from NYC to JAX over NLR • 10GigE wave from JAX to MIA over FLR • Layer2 peering fabric extended to Sao Paulo and Chicago • Access to 2 IRNC links at layer2
U.S.-Brazil CMS CollaborationsNetworking • WHREN facilitates access to NSF IRNC links • U.S.-Latin America (WHREN-LILA) • U.S.-Europe (TransLight/StarLight) • Access to IRNC links by Brazil’s Tier2s lessens the burden on U.S. Tier1 • IRNC links are facilitating a division of labor to augment U.S. Tier1 and Tier2 capabilities by including/integrating Brazil’s Tier2 facilities, providing both human and machine resources • Project is underway to establish two 1GigE vlans connecting Brazil’s Tier2s to CERN using WHREN-LILA, AtlanticWave, CaveWave and TransLight/StarLight
Deployment of Cyberinfrastructure Tools • MonaLISA: monitoring system (Caltech) • Distributed service for monitoring, control, and global optimization of complex systems • Deployed at more than 340 sites • ROOTlets: • ROOT, widely-used HEP analysis framework, available to lightweight client environments • Latency tolerant • Scalable to the worldwide grid • Allowing for independent execution of ROOT code • FDT: Fast Data Transfer (FDT) • Capable of reading and writing at disk speed over wide area networks using standard TCP • Work is underway with FNAL and developers at DESY on integrating FDT with the CMS storage system dCache to enhance dataset transfer performance
MonALISA: Monitoring Grids, Networks, Compute Nodes, Running Jobs, Processes Running Jobs JOBS VRVS ALICE VO ACCOUNTING NET THROUGHPUT NET TOPOLOGY JOB LIFELINES JOB STATISTICS
Dynamic Network Path Allocation & Automated Dataset Transfer >mlcopy A/fileX B/path/ OS path available Configuring interfaces Starting Data Transfer Real time monitoring MonALISA Distributed Service System Internet Regular IP path APPLICATION DATA MonALISA Service Monitor Control A OS Agent B TL1 LISA Agent LISA AGENT sets up - Net Interfaces - TCP stack - Kernel - Routes LISA APPLICATION “use eth1.2, …” Optical Switch Active light path Detects errors and automatically recreates the path in less than the TCP timeout (< 1 second)
ROOTlets • Basic approach: • Allow users to execute ROOT analysis code in a sandbox. Servlet = ROOTlet • Many instances of ROOT can run on cluster • Service container provides • Input parameter vetting • Access control/user mapping • Logging • Job control • Loosely coupled components: • ROOT client • Compare with PROOF, which is tightly coupled • One or more service hosts with vanilla ROOT installed • Service host may optionally be cluster head node • ROOTlets run either as ordinary processes on service host or as batch job on cluster node • Rootlet service adds value beyond simple remote job submission: • Monitoring of running jobs • Allows file up/download to job sandbox • Multiple clients: ROOT itself, browser, scripts
Physicist at Tier3 using ROOT on GBytes of ntuples Loads ROOTlet plugin. Connects to Clarens. Sends analysis code (.C/.h files) CherryPy creates ROOTlet at Tier2 site, sends .C/.h files ROOTlet executes analysis code on TBytes of data, creates high statistics output file. ROOT client at Tier3 fetches and plots data ROOTlets Phase I Operationally GBytes TierN TeraGrid Analysis.C, Analysis.h Rootlet Plugin CherryPy HTTP/HTTPS ~10s TBytes
ROOTlets: futures • Phase I now Phase II DOE STTR project with Deep Web Technologies (Abe Lederman) and Indiana (Geoffrey Fox) “The next generation of particle accelerators will produce more data than has been gathered up to now by all of humankind. This project will use advanced Internet technology to develop scalable data analysis architecture for enabling basic scientists to understand the fundamental particles in nature, or for Homeland Security …” • Upgrade to Publish/Subscribe • Avoids pure Web services limitations: request/response system, needs polling, one-to-one comms only • Use instead “NaradaBrokering” • Asynchronous message service “bus” with multiple transport protocols: TCP, UDP, Multicast, SSL, HTTP and HTTPS, etc. • Publish/subscribe: hierarchically arranged “topics” ROOTlet Repositories Locator Agents
NaradaBrokering: Beyond Services • Pure web services approach has some inherent drawbacks: • It is a request response system, no way for server to contact client unless polled • One-to-one communication only, no way to broadcast to multiple clients, unless polled • Started collaboration with Indiana University messaging group • Scalable, Open Source messaging system called “NaradaBrokering” http://www.naradabrokering.org/ • Provides asynchronous message service “bus” with multiple transport protocols: TCP (blocking, non-blocking), UDP, Multicast, SSL, HTTP and HTTPS, Parallel TCP Streams • Allows various components in ROOTlet system to communicate without continuous polling • Publish/subscribe system with hierarchically arranged “topics” • Project funded through DOE STTR Research Grant now in Phase II
ROOTlets Phase II - Repositories ROOTlet Repositories
FDT Integration with Mass Storage (dCache) I.Narsky,F.Khan Source dCache • Developing use of FDT as high performance dataset (file-collection) mover for dCache • Benefits: • Upgrade data transfer performance & reliability for CMS and ATLAS • Matched to circuit-oriented services under development in US LHCNet and ESnet • Approach 1: dCache POSIX interface • Allows direct FDT interaction with storage system (few application mods) • Functionality/Performance tests underway • Approach 2: dCache “adapter” (see diagram) • Tested between Caltech and UFlorida Tier2s • Caveat: Requires a high-perf. server at each site • Eventually will use streaming LAN Intermediate disk store WAN (FDT) Target client LAN Remote dCache
HEP SC07 Results: 80+ GbpsWith a Rack of Servers 11/14/07 40 G In 40 G Out 88 Gbps Peak, 80+ Gbps Sustainable for Hours, with Packet Loss
Enhancing CHEPREO’s Tier-3 Computing FacilityProposed Activities
Tier-3 Computing Facility • In support of the Open Science Grid • Still plan to involve our group with OSG integration and validation activities • Participate in integration testbed • Help carryout testing of new OSG releases • Help debug and troubleshoot new grid middleware deployments and applications • This work is currently man-power limited • Our systems administrator is busy with new hardware deployments, and procurement and training new CHEPREO fellows • In support of CMS computing: production activities • Not yet involved in CCRC08 or CSA08: • Will require new workernodes with 1.0+ GB of RAM/core to participate • 10 new dual quadcore nodes + new gatekeeper • Specs have been defined bids will be submitted • Expect quotes by end of March • Will require additional services for CMS production • Storage Element:We evenutually plan to deploy our own dcache based SE but now pointing our cmssoft installation to our local Tier-2 at UF • FRONTIER/squid now a must for participation: use existing Xeon hardware, CHEPREO fellows now working on this task
Tier-3 Computing Facility • In support of local CMS interactive analysis activities: Issue: How to access CMS data from FIU, several solutions being investigated. They involve close coordination with our CHEPREO Tier-2 partners at UF and Caltech • Download data to a Tier-2 and then download to FIU via FDT • Deployment, utilization and benchmarking of FDT tool within CMS’ Distributed Data Management system • With UF, download data to WAN enabled LUSTER filesystem • Tests already demonstrated line speed access with LUSTER over LAN at UF between UF’s Tier-2 and HPC • Testing underway on isolated testbed now with UF’s HPC group • Applied kernel patches and reconfiguring network route to FIU test server • pecial Access to HPC facility, unlikely general CMS wide solution • L-store: • Access CMS data stored in L-store depots in Tennessee region and beyond • Vanderbilt is working on L-store/CMSSW integration and we would like to help in the testing when available • We would also like to deploy our own L-store depot on-site with Vanderbilt supplied hardware currently at FIU
Networking Proposed Milestones • Integrate and deploy next-generation dynamic circuit network provisioning and end-to-end managed network services • Adopt best practices from UltraLight, US LHCNet, Esnet, others • Establish and test mission-oriented point-to-point circuits over the UltraLight/CHEPREO-WHREN-LILA/TransLight IRNC infrastructure • Extend UltraLight to Brazil using WHREN-LILA/AtlanticWave and other available infrastructure • Part of US LHC networking strategy to support transatlantic networking, and Ultralight supporting networking to the US LHC universities, including CHEPREO partners in Latin America • Implement next-generation SDH protocols to improve efficiency and management of layer1 and layer2 services • Deploy and test control plane tools and services on the CHEPREO-UltraLight research testbed for application-provisioned dynamic circuit
Milestones (cont.) • Tier3 Computing Facility • Build a new CyberEnvironment • Train FIU students, high school teachers and students in the effective use of the cyberenvironment, to participate in discoveries at the LHC