1 / 20

Lambda Station

Lambda Station. Matt Crawford, Fermilab co-PI: Don Petravick, Fermilab co-PI: Harvey Newman, Caltech. HEP Computing. Labs plus University Community Vast ensembles of commodity equipment Something like a petabyte of IDE disk Storage system to storage system transfer

kemp
Download Presentation

Lambda Station

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lambda Station Matt Crawford, Fermilab co-PI: Don Petravick, Fermilab co-PI: Harvey Newman, Caltech

  2. HEP Computing • Labs plus University Community • Vast ensembles of commodity equipment • Something like a petabyte of IDE disk • Storage system to storage system transfer • Refresh of 200 TB of state at universities • Structured production, “chaotic” analysis

  3. HEP Networking • Office of High Energy Physics funds LHCnet, (OC192 triangle Starlight  CERN MANLAN) • Interested in switched optical networking • UltraLight (Caltech) • UltraScience Net (ORNL) • OSCARS MPLS tunnels (ESnet: FNALBNL, etc) • FNAL-CERN 875 MB/Sec SS-SS service challenge • Interest, testing, and following of improvements to TCP at high bandwidth  delay • Given the directions of HEP computing, the ends of “pipes” are likely to be locally, competently engineered networks.

  4. Problem statement • Experiments and applications now running, or starting soon, will benefit from data movement capabilities now available only on bleeding-edge networks. • These systems are connected to production site networks. Duplicating site infrastructure to connect them to special-purpose networks is an expense to be avoided if possible. • Multihoming the endpoints to multiple networks is complicated and expensive and it (nearly) precludes graceful failover when one path is lost. • Applications (and operating systems) should not have to be re-customized for every new network technology or high-performance path.

  5. Additional complications • Rates are not predictable for real data sources and sinks. • Memory-to-memory is somewhat deterministic, but disk-to-disk has several uncontrolled variables. • Applications may use multiple streams for maximum exploitation of high-speed links. Lambda Station must be able to deal in aggregates. • Straggler flows persist after bulk of transfer has completed, and continued use of high-volume path may be wasteful at that point. • Aggressive protocols for the wide area may have negative impacts on the last mile (site or site’s “uplink”) network.

  6. Lambda Station • Function • Schedule use of one or more reservable network paths • Arrange for traffic to be forwarded onto such paths

  7. Interfaces to other systems • To application (or to manual request system) • To authentication/authorization infrastructure • To site’s internal network (dynamic reconfiguration of packet forwarding rules) • Operate at any granularity, down to single flows • Site’s border/connection point to reservable path • Peer site’s Lambda Station • Talk to advanced WANs, through network operator-defined setup protocol, as needed* • Monitoring, accounting, status reporting

  8. Block Diagram

  9. Client application interface • Application describes the traffic which is to be routed over an alternative path. • Traffic selectors: 6-tuples [ IP version, {src cidr(s)}, {dst cidr(s)}, protocol, {src port(s)}, {dst ports(s)} ] • Transfer rate, total volume, duration, direction • Earliest desired start • LS and host agree on packet-selection method - we lean toward DSCP. • LS informs application of actual BW allocated and setup status. • Host or LS should inform the other of early termination, if it occurs.

  10. Site network interface • Configure local site’s internal routing to divert traffic to the alternate path. • Graceful teardown – resume normal internal routing before WAN path is torn down. • Different version of this module will deal with different varieties of site network. • Each site might plug in its own scripts.

  11. Site-edge router interface • Graceful setup – Enable the reserved WAN path before internal routing directs traffic onto it. • ACL may be in effect on this device to prevent unauthorized use. • ACL very likely to be in effect with respect to incoming traffic from the WAN. • At some sites, this is a path which bypasses firewalls!

  12. LS-to-LS protocol • Exchange traffic selectors • Coordinate setup & teardown • Verify path continuity • Implies that LS can communicate simultaneously over reserved and commodity network paths. • Inform of early traffic termination

  13. Advanced WAN interface • Multiple flavors of high-performance WANs are anticipated. • Some WANs may require forwarding state to be created before use. • Some may have their own reservation system, which end systems need not learn to use if it reserves through Lambda Station instead. • Lambda Station’s WAN module will parameterize and adapt to each sort of WAN, providing an abstract view. • DOE UltraScience Net, ESNET, LHCNet, UltraLight.

  14. Requirements for Production • Robustness • LS must enable production systems to make trial use of advanced networks, and cleanly restore default forwarding behavior upon completion or path failure. • Monitoring • Lambda Station must present its own state and history. • Currently it serves this info through its web server. • Investigating MonaLisa (OSG component). • Accounting • In many environments, different sub-organizations share the network resource. LS must gather usage information to support accounting.

  15. Provide sample integration • With Storage Systems that are components of the USCMS software and computing project. • Currently are : • Managed storage elements. • SRM / GridFTP protocols. • Now implementing LS client calls in SRM/dCache.

  16. Current status • Release 1.0 – today. • A stable, usable snapshot of a work in progress. • Based on Perl with SOAP::Lite • Dynamically reconfigures site routers to send traffic over alternate paths • End system applied DSCP tags to special-treatment flows. • Traffic path varied cleanly – unnoticed by application; hiccups in throughput at each change.

  17. Path switching effects

  18. Deployment Scenarios Client capabilities: identifying high-impact traffic ... 1. Specify src & dst address groups, but no more. 2. Specify src and/or dst ports as well as addresses. 3. Apply DSCP label selected by client 4. Apply DSCP label as directed by Lambda Station. Client capabilities: Lambda Station integration level ... 1. Lambda Station called manually via web interface 2. SOAP call by wrapper around client application 3. SOAP calls from within the client application Site network capabilities ... 1. Static router config w/ fixed PBR based on DSCP 2. Router ACLs activated and inactivated by LS 3. Lambda Station constructs and applies ACLs for PBR

  19. Directions • Next version being built on Apache Axis • probably will use jClarens • WSDL is sure to evolve • IPv6 support is mere placeholder as yet • Adding support for Force10 site routers • Looking forward to speaking to your lightpath WAN directly!

  20. Summary • Lambda Station’s role in data-intensive science is to dynamically connect production end-systems to advanced high-performance wide-area networks. • Bring the systems to the network • Bring the network to the systems • Prototyping has shown the feasibility of using dynamically selected network paths for traffic between production site networks.

More Related