300 likes | 414 Views
Paul Sheldon Vanderbilt University. Evolving Data Logistics Needs of the LHC Experiments. Outline. To meet its computing and storage needs (and socio/political and funding needs), the LHC experiments deployed a globally distributed “grid” computing model.
E N D
Paul Sheldon Vanderbilt University Evolving Data Logistics Needs of the LHC Experiments
Outline • To meet its computing and storage needs (and socio/political and funding needs), the LHC experiments deployed a globally distributed “grid” computing model. • Successfully demonstrated grid concepts in production at an extremely large scale – enabled significant science! • Short term (2 years) and longer term (10 years) projects will require them to significantly evolve their computing model. • Wide area distributed storage • Agile, intellegent network based solutions: DYNES and ANSE • Thanks to ArturBarczyk, Ken Bloom, Eric Boyd, OliGutsche, Ian Fisk, Harvey Newman, Jim Shank, Jason Zurawski, and others for slides and ideas. Emerging Data Logistics Needs of the LHC Expts
A High Profile Success Emerging Data Logistics Needs of the LHC Expts
A High Profile Success The LHC would not have been successful without grid computing, data logistics, and robust high performance networks Emerging Data Logistics Needs of the LHC Expts
Distributed Compute Resources and People CMS Collaboration: 38 Countries, >163 Institutes Emerging Data Logistics Needs of the LHC Expts
Distributed Compute Resources and People • LHC Experiments decided early on that it was not possible or even desirable to put all or even most computing resources at CERN: Sociology, Politics, Funding • Physics is done in small groups, which are themselves geographically distributed • To maximize the quality and rate of scientific discovery, all physicists must have equal ability to access and analyze the experiment's data Emerging Data Logistics Needs of the LHC Expts
Distributed Compute Resources and People • LHC Experiments decided early on that it was not possible or even desirable to put all or even most computing resources at CERN: Sociology, Politics, Funding • Physics is done in small groups, which are themselves geographically distributed • To maximize the quality and rate of scientific discovery, all physicists must have equal ability to access and analyze the experiment's data • Adopt a grid-like computing model, use those pieces • of grid concepts and tools that were • “ready” and needed. Emerging Data Logistics Needs of the LHC Expts
Tiered Computing Model Tier 0 • LHC raw data logged to the “Tier 0” compute center at CERN. • T0 does initial processing, verification and data quality tests. Emerging Data Logistics Needs of the LHC Expts
Tiered Computing Model Not all Tier 1’s are shown Tier 1 Tier 1’s 20 Gbs dedicated network Tier 1 Tier 0 Tier 1’s Data is then distributed to Tier 1 centers, the top level centers for each region of the world (US CMS Tier 1 is at Fermilab). These process the data to create “data products” that physicists will use Emerging Data Logistics Needs of the LHC Expts
Tier 2 and Tier 3 Centers • 8 Regional Tier 2’s in the US, 53 worldwide. • Tier 3’s serve individual Universities • Tier 2 and 3’s provide the computing and storage used by physicists to analyze the data & produce physics. • Physicists work in small groups, but those groups may be globally distributed. Data Flow Emerging Data Logistics Needs of the LHC Expts
Analysis Projects are Mostly Self-Organized • Worker nodes at each T2/3 site present a batch system • Access through grid interface/certs (no local access) • User analysis workflows built and submitted by “grid enabled” software framework (CRAB, PanDA) • breaks tasks into jobs – decides where to run them – monitors them – retrieves the output when they are done. • Data tethering: originally jobs could only run at a site if they were hosting the required data locally • Evolving model (more later) is relaxing this requirement Emerging Data Logistics Needs of the LHC Expts
200K – 300K LHC jobs launched/day How is it Working? CPU Hours (HS06h) CMS Tier 1 1 HS06h ~ 1 GFLOP hour http://w3.hepix.org/benchmarks/doku.php/ CPU Hours (HS06h) CMS Tier 2 Emerging Data Logistics Needs of the LHC Expts
Data Movement • Some data is strategically placed based on high level “management” decisions • Users can also request datasets be placed at “friendly sites” where resources have been made available to their analysis group. (They can whitelist those sites in CRAB/PanDA.) • System for Organized Transfers (PhEDEx in CMS)… • Destination based transfer requests (transfer dataset X to site A) • System picks out source sites based on link quality and load (several source sites might be used) • PhEDExschedules GridFTP transfers between sites through several layers of software • Software layers protect sites, enable multiple streams and take care of authentication (Globus tools, SRM, FTS, etc.) Emerging Data Logistics Needs of the LHC Expts
Aggregate Bandwidth on all Tier N Links 2012 How is it Working? ATLAS AVG: ~5 GB/s Throughput (MB/S) Combined: 180 PB/year CMS 2012 AVG: ~1 GB/s Throughput (MB/S) Emerging Data Logistics Needs of the LHC Expts
Model Has Evolved • Originally strictly hierarchical • In addition to strategic pre-placement of data sets… • Meshed data flows: Any site can use any other site to get data • Dynamic data caching: Analysis sites will pull datasets from other sites “on demand” • Remote data access: Jobs executing locallyaccess data at remote sites in quasi-real time Emerging Data Logistics Needs of the LHC Expts
Near Term Data Requirments: LHC Run 2 • LHC Run 2 (begins 2015): Data xfer volume 2-4 times larger • Energy increase 8 to 13 TeV – more complex events, larger event sizes • Trigger rate increase from 300-400 Hz to 1 kHz • Dynamic data placement and cache release • replicate samples in high demand automatically • Release cache of replicas automatically when demand decreases • Establish CMS data federation for all CMS files on disk • Allow remote access of files with local caching as job runs • processing can start before samples arrive at destination site • Higher demand for sustained network bandwidth, fill gaps btwn bursts • Increase diversity of resources: get access to more resources Emerging Data Logistics Needs of the LHC Expts
10 Year Projections • Requirements in 10 years? Look 10 years back… Source: Ian Fisk and Jim Shank Emerging Data Logistics Needs of the LHC Expts
Fisk and Shank 10 Year Projections • The processing has increased by a factor of 30 in capacity • This is essentially what would be expected from a Moore’s law increase with a 2 year cycle • Says we spent similar amounts • Storage and networking have both increased by factor of 100 • 10 times trigger and 10 times event size • In their judgment, future programs will likely show similar growth • “How to make better use of resources as the technology changes?” Fisk and Shank, Snowmass Planning Exercise for High Energy Physics Emerging Data Logistics Needs of the LHC Expts
Fisk/Shank View of the Future Emerging Data Logistics Needs of the LHC Expts
…Fisk/Shank View of Future Emerging Data Logistics Needs of the LHC Expts
…Fisk/Shank View of Future Emerging Data Logistics Needs of the LHC Expts
DYNES: Addressing Growing/Evolving Network Needs • NSF MRI-R2: DYnamicNEtworkSystem (DYNES) • A nationwide cyber-instrument spanning ~40 US universities and ~14 Internet2 connectors • Extends Internet2/ESnet Dynamic Circuit tools (OSCARS, …) • Who is it? • Project Team: Internet2, Caltech, Univ. of Michigan, and Vanderbilt Univ. • Community of regional networks and campuses • LHC, astrophysics community, OSG, WLCG, other virtual organizations • What are the goals? • Support large, long-distance scientific data flows in the LHC, other data intensive science (LIGO, Virtual Observatory, LSST,…), and the broader scientific community • Build a distributed virtual instrument at sites of interest to the LHC but available to R&E community generally Emerging Data Logistics Needs of the LHC Expts
Why DYNES? • LHC exemplifies growing demand for research bandwidth • LHC “Tiered” computing infrastructure already encompasses 140+ sites moving data at an average rate of ~5 GB/s • LHC data volumes and transfer rates are expected to expand by an order of magnitude over the next few years • US LHCNet, for example, is expected to reach 280-400 Gbps by 2014 between its points of presence in NYC, Chicago, CERN and Amsterdam • Network usage on this scale can only be accommodated with planning, an appropriate architecture, and nationwide community involvement • By the LHC groups at universities and labs • By campuses, regional and state networks connecting to Internet2 • By ESnet, US LHCNet, NSF/IRNC, major networks in US & Europe Emerging Data Logistics Needs of the LHC Expts
DYNES Mission/Goals • Provide access to Internet2’s dynamic circuit network in a manageable way, with fair-sharing • Based on a ‘hybrid’ network architecture utilizing both routed and circuit based paths. • Enable such circuits w/ bandwidth guarantees across multiple US network domainsand to Europe • Will require scheduling services at some stage • Build a community with high throughput capability using standardized, common methods Emerging Data Logistics Needs of the LHC Expts
Why Dynamic Circuits? • Internet2’s and ESnet’s dynamic circuits services provide: • Increased effective bandwidth capacity, and reliability of network access, by isolating traffic from standard IP “many small flows” traffic • Guaranteed bandwidth as a service by building a system to automatically schedule and implement virtual circuits • Improved ability of scientists to access network measurement data for network segments end-to-end via perfSONARmonitoring • Static “nailed-up” circuits will not scale. • GP network firewalls incompatible with enabling large-scale science network dataflows • Implementing many high capacity ports on traditional routers would be expensive Emerging Data Logistics Needs of the LHC Expts
DYNES Agent will provide the functionality to request a circuit instantiation, initiate and managethe data transfer, and terminate the dynamically provisioned resources. • user requests via a DYNES Transfer URL • DYNES provided equipment at end sites and regionals (Inter-Domain Controllers, Switches, FDT Servers) Site Z DYNES Dataflow Dynamic Circuit Site A Emerging Data Logistics Needs of the LHC Expts
DYNES Topology Based on Applications Rcvd from Potential Sites Emerging Data Logistics Needs of the LHC Expts
ANSE: Integrating DYNES into LHC Computing Frameworks • NSF CC-NIE: Advanced Network Services for Expts (ANSE) • Project Team: Caltech, Michigan, Texas-Arlington, & Vanderbilt • Goal: Enable strategic workflow planning including network capacity as well as CPU and storage as a co-scheduled resource • Path Forward: • Integrate advanced network-aware tools with the mainstream production workflows of ATLAS and CMS • Use in-depth monitoring and network provisioning • Complex workflows: a natural match and a challenge for SDN • Exploit state of the art progress in high throughput long distance data transport, network monitoring and control Emerging Data Logistics Needs of the LHC Expts
ANSE Methodology • Use agile, managed bandwidth for tasks with levels of priority along with CPU and disk storage allocation. • define goals for time-to-completion, with reasonable chance of success • define metrics of success, such as the rate of work completion, with reasonable resource usage efficiency • define and achieve “consistent” workflow • Dynamic circuits a natural match • Process-Oriented Approach • Measure resource usage and job/task progress in real-time • If resource use or rate of progress is not as requested/planned, diagnose, analyze and decide if and when task re-planning is needed • Classes of work: defined by resources required, estimated time to complete, priority, etc. Emerging Data Logistics Needs of the LHC Expts
Summary and Conclusions • Successful globally distributed computing model has been extremely important for LHC’s success • Have intentionally not been bleeding-edge, but have certainly validated the distributed/grid model • For the future, CPU may scale sufficiently but we can’t afford to grow storage by a factor of 100 • Agile, large, intelligent network-based systems are key • Leads to less bursty, more sustained use of networks • Projects such as DYNES and ANSE are essential first steps as the LHC evolves to what Harvey Newman calls a data intensive Content Delivery Network Emerging Data Logistics Needs of the LHC Expts