440 likes | 634 Views
Proposed future direction for CHEETAH. Malathi Veeraraghavan University of Virginia August 23, 2006. Outline Strategy discussion: What's our goal for the CHEETAH network: eScience network or a scalable GP network? Bandwidth sharing mode: Book-Ahead (BA) or Immediate-Request (IR)?
E N D
Proposed future direction for CHEETAH Malathi Veeraraghavan University of Virginia August 23, 2006 • Outline • Strategy discussion: • What's our goal for the CHEETAH network: • eScience network or a scalable GP network? • Bandwidth sharing mode: • Book-Ahead (BA) or Immediate-Request (IR)? • Tactical aspects: • Network evolution • Networking software modules • Application software modules • Interconnection to HOPI/DRAGON
Observation • "Many e-science experiments are unique applications that involve collaboration among a handful of facilities. As a result, networks supporting these experiments are optimized to provide maximum throughput to a few facilities, as opposed to moderate throughput to millions of users, which is the raison d'etre for commercial networks."
eScience networks • eScience network requirements • Number of users small • Hard to achieve high utilization; also not impt. • Overprovision network to keep call blocking rate low • We can then focus on creating software to allow scientists to automatically create high-speed application-specific topologies: AST, UCLP, OSCARS, USN scheduler, BRUW • Bandwidth-sharing algorithms of less concern
General-purpose commercial networks • Has to be scalable: large number of users • Metcalfe's statement: Value of a network increases exponentially with the number of users • High utilization is an important goal • Low call blocking probability or low waiting time for resources • Focus on efficient bandwidth-sharing algorithms
Circuit/VC service on GP commercial networks • Just for ISPs/enterprise admins: • needs similar to eScience • router-to-router circuits • limited number of users • high-bandwidth, long-held circuits • low price not a high priority • need BA mode of bandwidth sharing • For end users • large number of users • can only offer moderate BW and limited call holding times • IR mode of sharing becomes feasible
BW sharing modes in circuit/VC networks m is the link capacity expressed in channels e.g., if 1Gbps circuits are assigned on a 10Gbps link, m = 10 • Mean waiting time is proportional to mean call holding time • Can afford to have a queueing based solution if calls are short Large m Moderate throughput Small m High throughput immediate-request with call blocking + retries ("call queueing") (video, gaming) Short calls Bank teller Long calls Doctor's office immediate-request with delayed-start times (file transfers) book-ahead
Impact of increasing m at different values of link utilization Ud m=10 Pq=41% Prob. of arriving job finding all m circuits busy Offered load: call arrival rate/call departure rate Link capacity expressed in channels High-rate per-call circuits Low-rate per-call circuits
Impact of mean call holding time Number of ports aggregating traffic on to the link Mean waiting time for delayed calls Ud: 90% : per host call-generation rate
Main findings of analysis • Two key parameters: • If m is small (per-circuit BW is high) • and mean call holding time is large • then need BA to avoid long waiting times • and mean call holding is small (file transfers) • then use "call queueing" • If m is large, switch hardware costs increase • N, number of aggregation ports, high • level of demultiplexing high • Moderate m: best choice
Support for the BA mechanism of bandwidth sharing • Since RSVP-TE does not have parameters for BA calls (call duration, start time), this mode is not implemented in switch controllers • Need an external scheduler to manage bandwidth into the future • Easiest to make it centralized - one per domain • Cannot utilize the BW management software implemented in switch controllers as part of GMPLS control-plane software • The BA mode is necessary for high-BW, long-held calls
Support for the IR mechanism of bandwidth sharing • Switches have built-in (G)MPLS control-plane software (RSVP-TE/OSPF-TE) • Bandwidth management is part of RSVP-TE switch controller software • Hence it is distributed bandwidth management • Need to limit call holding time - reminders for renewals and automatic release • Moderate-to-high per-call bandwidth
To implement BA, IR, or both? • Implement only BA • Develop and "standardize" protocols for scheduler-to-scheduler signaling for interdomain circuits (one centralized scheduler per domain) • Implement scheduler and test with other networks • Create software tools to enable scientists and ISP/enterprise admins to visualize network topologies and request appropriate circuits/VCs • High-BW, long-held: Therefore AAA is a must • Path being pursued by DRAGON, USN, OSCARS, UCLP
Opportunity missed if the whole optical testbed community only experiments with BA • What opportunity? • Enable the creation of large-scale circuit/VC networks with moderate-rate circuits that can support a brand new class of applications • economic value for the telcom industry • A "reservations-oriented" mode of networking to complement today's connectionless Internet • ala airlines that complement roadways • Could prove useful to FIND, GENI, net-neutrality • Alternative pricing models for bandwidth
What "brand new class of applications?" • Video, video, video • Gaming • Remote software access + Sync. storage • Async storage • Multimedia (large) files in web sites
Video applications • Improve quality of conferencing, telephony, surveillance, entertainment and distance-learning by a significant degree • Expend bandwidth for a higher-quality, lower latency, multi-camera, auto-movement, auto-mixing experience • Make the "flat world" flatter • Energy savings/environmental benefits • Moderate bandwidth - IR with call blocking/retries
Gaming applications • Current gamers buy personal graphics cards • Players talk of "lag" caused by differences in graphics processing speeds • Moderate-speed circuits can enable a new class of games in which rapidly-changing scenes are possible • compare movies in which multiple story lines keep scenes changing vs. gaming scenes • Players connect to graphics servers • Data transferred is not GL commands, but rather rendered bits (doable?) • Moderate bandwidth - IR with call blocking/retries
Remote software access/sync storage • Remote software access • Reduce computer administration cost • Personal computers vs. machine rooms • I loaded 22 new applications on my new laptop • Instead: connect and run! • Virtual Computing Laboratory: Mladen Vouk, NCSU • Synchronous storage access • Disaster recovery • Moderate bandwidth - IR with call blocking/retries
Asynchronous storage • Asynchronous storage depots will lower costs for • backups • disaster recovery • Need for increased storage grows with multimedia files • High bandwidth, short calls • IR with delayed start
Larger files in web sites • Multimedia files in web sites • Imagine the use of video/audio files in all sorts of web sites instead of ASCII • My own course PPT files: I use audio sparingly because of bandwidth • Think assembly instructions for electric fans, furniture • Kinesthetic learning - show me a video • Think hotel web pages • Show me exactly where the beach is relative to my room; do I have a balcony - saying it in text format is one thing; seeing it in a video format quite another! • Content distribution network & web caching • High bandwidth, short calls • IR with delayed start
Are all these "high"-BW apps just a matter of increasing BW of links in the current Internet? • No • The socialistic mode of bandwidth sharing on the Internet discourages individual investment in network bandwidth • Age-old question: • should we pay for bandwidth with tax dollars - "free" for the whole community? • "Tragedy of the commons" (Tanenbaum) • should we create a network where individuals can pay for bandwidth on congested links more directly? - think higher-toll HOV lanes
What does all this mean? • Let's build a scalable circuit/VC network in which bandwidth is shared in IR mode • Scalability will create "Metcalfe's value" • Provides an opportunity to finally recoup our investment in (G)MPLS technologies • standards creation effort • implementation: Cisco, Juniper, Sycamore, Movaz • Assign at least a few of the optical testbeds that we are investing in now to study whether this IR mode of bandwidth sharing can help with our understanding of net-neutrality, economic growth, FIND questions • IR more natural in data world unlike in airlines (BA)
Argument: IR is just a "now" in BA • BA and IR cannot coexist without some form of bandwidth partitioning • BA allows for high-BW, long-duration calls • IR calls will suffer a high call blocking rate if supported through BA scheduler (the "add-now-as-an-option-in-scheduler" solution) • Should you admit an IR call if it arrives a few seconds before start time of a BA call and hope it completes before the BA call start time, or reject the call and waste bandwidth?
CHEETAH and TSI • The CHEETAH network solves only part of the TSI problem • Other problems • Cray computer I/O problem • Local-area connectivity within NCSU • If the CHEETAH project was a production solution to support TSI, we should spend money to solve these two problems for TSI • But as an experimental short-lived networking project, where should we focus?
Outline • Strategy discussion: • What's our goal for the CHEETAH network: • eScience network or a scalable GP network? • Bandwidth sharing: • Book-Ahead (BA) or Immediate-Request (IR)? • Tactical aspects: • CHEETAH network evolution • Networking software modules • Application software modules • Interconnection to HOPI/DRAGON
Network evolution to support IR • Current CHEETAH network only supports 10 circuits per OC192 link • remember IR mode does not work well when m, the link capacity in channels, is small (i.e., 10) • Recoup OC1-crossconnect capability of the SN16Ks from its current 1Gbps use • Has three advantages • supports higher m; better for IR • GMPLS standards based signaling • Call setup delay: 166ms for two-hop instead of 1.5sec!
Network evolution options • Four options: • VLAN-enabled NICs + VLSR for SN16K • 15454 with VLSR • IP router with VLSR • Ethernet switch with VLSR
Example: web caching application VT UVa duke C'ville, VA mvsut6 UTK CHEETAH VLANs UNC wukong zelda4 ORNL Raleigh, NC NCSU zelda3 Atlanta, GA Gatech UGa Xiuduan Fang, xf4c@virginia.edu Bob Gisiger, rwg5f@virginia.edu
VLAN-enabled NICs + SN16K VLSR • SN16K has data-plane support to map a sub-Gb/s VLAN on an Ethernet port to a corresponding number of OC1s on a SONET port • But, it does not have control-plane support for this type of circuit • Even GMPLS support for the GbE port mapping to a 21-OC1 VCAT signal is an experimental release just for CHEETAH usage • Because GMPLS support for such hybrid circuits is non-standard • Can implement our own (non-standard) solution as a VLSR • But, goal is to use off-the-shelf switches with GMPLS support to demonstrate IR mode
15454/VLSR at each PoP • Make the 15454 serve as the intermediary between Ethernet NICs in hosts and SONET based SN16Ks at CHEETAH PoPs • A 15454 VLSR could be useful for other projects, UCLP, Ultralight • Cisco has no plans to implement a GMPLS control-plane engine for the 15454 • Two problems: • Non-standard solution for hybrid circuits • VLAN ID continuity requirement • Cannot support partial-path circuits
IP router/VLSR at each PoP • Use channelized OCxx SONET interfaces to connect IP router to SN16K • Connect web caches to router • Have routers initiate pure SONET circuit setup • Use PBR or just ordinary routing table update to map flows to different OCxx circuits; support multiple circuits from one web cache
UVa Via Vortex/HOPI CUNY Via Nysernet/HOPI OC-192 GaTech CHEETAH wide-area network Raleigh PoP (MCNC) SN16000 via NCREN NCSU GbE/ 10GbE card OC192 card Control card End hosts OC-192 (via NLR/SLR/NCREN) ORNL PoP Atlanta PoP (SOX/SLR) SN16000 SN16000 GbE/ 10GbE card Control card OC192 card GbE/ 10GbE card OC192 card Control card End hosts End hosts ORNL
CHEETAH evolution to support sub-Gb/s circuits UVa GbE CUNY Raleigh PoP GbE SN16000 GbE OC192 card Control card OC192 card NCSU GbE End hosts OC-192 (via NLR/SLR/NCREN) ORNL PoP Atlanta PoP SN16000 SN16000 GbE GbE OC192 card Control card OC192 card OC192 card Control card End hosts OC192 card End hosts OC-192 GbE GbE GaTech ORNL
IP router/VLSR at each PoP • Can support end-to-end circuits • web caching • CDN servers • video apps at 10-15Mbps - map to one OC1 • storage depots • Has the potential to support PPCs (partial-path circuits) • Place router with VLSR in enterprises at edge of GbE cheetah access link
Ethernet switch/VLSR at each PoP • Does not help with the problems noted in today's Gb/s circuit use of the SN16K • long call setup delays: 1.5sec • non-standard solution • high per-circuit BW • Using an Ethernet switch/VLSR at an enterprise (e.g. CUNY) requires all VLANs sharing 1Gbps CHEETAH access link to be switched to the same exit SN16K. • Even worse, m=1 if whole 1Gb/s link used for a circuit
Software modules required • Networking software: • CVLSR for IP router • CTCP code to support multiple simultaneous flows • Application software: • Add CHEETAH API to web caching squid software • Write software for video apps • CDN and storage software
Equipment required • IP routers with channelized SONET cards with GA GMPLS UNI implementation • need one for ORNL PoP • if we can partner with SOX in ATL, NCREN in Raleigh, MAX in McLean, purchase channelized OC192 cards • IP routers with GbE blades for V. Tech and UVA • If NC 454 transponders are unavailable, purchase transponders for DC-Raleigh NLR link - since HOPI doesn't have this link • Colocation costs at NLR McLean and Raleigh
Interconnect CHEETAH and HOPI • Through IP routers • With our IP router/VLSR combo, setup router-to-route SONET OC1 circuits via cheetah and router-to-router VLAN virtual circuits through HOPI. At routers, do PBR mapping for flows or just update routing tables • This means packets go back to IP layer between two networks
Web cache Web cache Web cache CHEETAH-HOPI interconnection Web cache VLAN Web cache Web cache McLean, VA NC Web cache MPLS 10GbE Web cache CHEETAH SONET HOPI network: courtesy of Rick Summerhill TN GA
HOPI and web caching • Seems like a good match • Rick's black cloud experiment - same as web caching • Exercises "hybrid" goal of HOPI • Small per-circuit BW possible with VLANs
Connection to DRAGON • Spoke with Jerry Sobieski, Aug. 12, 2006 • He said DRAGON PoPs have Ethernet VLAN switches • Therefore, can use similar IP router demarcation points to interconnect CHEETAH/HOPI to DRAGON
Conclusions • We'd like to • enable and demonstrate general-purpose apps using circuit/VC service with scalability as a key goal • support IR mode of bandwidth sharing with limited per-call bandwidth and limited call holding times • call blocking with retries • delayed start