460 likes | 614 Views
Building a connection-oriented internet. Outline Problem statement CHEETAH: an NSF-funded experimental project Research problems The “internet” name in the title. Malathi Veeraraghavan Univ. of Virginia. Problem statement.
E N D
Building a connection-oriented internet • Outline • Problem statement • CHEETAH: an NSF-funded experimental project • Research problems • The “internet” name in the title Malathi Veeraraghavan Univ. of Virginia
Problem statement • How do we add on a complementary connection-oriented internet to the existing (connectionless) Internet? • To allow end host applications to make a reservation for bandwidth • Anywhere from 10Mbps to 10Gbps
Motivation • Analogy with transportation networks • “connectionless” roadways • no reservation made • “fair” but delay/jitter uncontrolled • “connection-oriented” airlines • reservation based • better guarantees on delay/jitter Call to make a reservation (if only for part of the distance: airport-to-airport) No facility today for an end host application to request and reserve bandwidth as needed airport airport
Motivation: eScience NCSU one 36-hour supernova simulation creates 1TB Supercomputers
Use Internet2 for 1TB download • Bottleneck link rate from NCSU to ORNL via Internet2/ESnet is 2.5 Gbps • But Prof. Blondin at NCSU only sees 300-400Mbps, Why?
Host Host Host Host Host Host IP router IP router IP router Two reasons • Disk access limitations at end hosts • TCP-IP bandwidth sharing mode • IP networks are connectionless – i.e., no reservations • New transfers can simply start up • impacts bandwidth available to ongoing transfers • “Socialistic” resource sharing • Number of bytes • sent vs. time • changing rate
Pros and cons of approach • It allows an ongoing transfer to enjoy bandwidth released by other transfers as they complete • On the other hand, as new transfers start up within the duration of the 1TB transfer, its rate decreases
Coming back to the problem statement • How do we provide scientists a rate-guaranteed connection for file transfers • At first glance, file transfers look like an ideal app. for high-speed circuits • no burstiness • can use as much as bandwidth as given
Gigabit Ethernet interface card Time-division or wavelength-division multiplexing optical interface card Circuit based gateway Circuit based gateway Gigabit Ethernet interfaces to hosts Gigabit Ethernet interfaces to hosts Circuit based gateway Circuit based gateway One answer: use optical fibers and circuit based gateways Guaranteed rates + high bandwidth Setup connection (make reservation) signaling engine: dynamic call setup/release • Gateways available that can crossconnect a Gigabit Ethernet port to an equivalent-rate time-division or wavelength-division multiplexed signal dynamically Transfer file Release connection (release resources) Control Circuit based gateway
Networks deployed based on this thinking (related work) • Canada’s Canarie: CA*net4 • OptIPuter project: UCSD and Chicago • Chicago: OMNInet (Starlight PoP) • Dragon: DC area network • Netherlands: SURFnet • UK: UKlight • DOE UltraScience network
Our NSF-funded project: CHEETAH • Circuit-switched High-Speed End-to-End Transport arcHitecture • Participants: • Malathi Veeraraghavan, UVA • Nagi Rao, Bill Wing, Tony Mezzacappa, ORNL • John Blondin, NCSU • Ibrahim Habib, CUNY • $3.5M project for three years, 2004-2007 Acknowledgment: NSF EIN grant ANI-0335190
15454 10 Gb/s Transponder Adding Cheetah into existing optical NC university network Duke 15454 UNC optical fiber Wavelength Division Mux/Demux (WDM) Internet2 connection 15454 15454 Qwest PoP Centaur Lab (John Blondin’s cluster compter) MCNC NCSU 15454 Circuit- based gateway 15454 Level(3) PoP NLR 15808 National LambdaRail smj 10-28-04 Mark Johnson, MCNC
National LambdaRail Affordable prices for 10Gbps lambdas
Implements sig. protocols To DC – Dragon 10 Gbps lambda NLR NLR WDM GaTech WDM OC192 ORNL WDM GaTech WDM SOX Atlanta 10 Gbps lambda Cheetah network NC To cluster computer Circuit-based gateway 10GbE GbE/ 10GbE card Control card GbE/10GbE Ethernet Switch OC192 card OC192 card NCSU MCNC/NLR ORNL To Cray Circuit based gateway GbE/ 10GbE card Control card GbE/10GbE Ethernet Switch OC192 card
Connecting Cheetah to Dragon and Ultrascience networks Dragon DOE Ultrascience network (ORNL) Cheetah
All this is fun, but • What are the research problems? • What happens if the call gets blocked? • Bandwidth sharing modes • Low load performance • Scheduled vs. immediate-request • Long paths vs. short paths • Mismatch between multitasking end hosts and TDM circuits
What happens if the call gets blocked? • In TCP/IP networks, your new transfer just joins in, perhaps receives small BW until some other transfers complete • In circuit-switched networks, your call is accepted or rejected • If rejected, what then? • Call queueing – wastes resources • because of multiple links Acknowledgment: NSF ITR grant ANI-0312376
Practical answer: Leverage presence of Internet path and fall back to it • Use second NICs at hosts for circuit connectivity leaving primary NIC for Internet access Connectionless Internet Two paths available End host I End host II Circuit-Switched Network Should we attempt a circuit setup for ALL file transfers? • Attempt circuit setup • If rejected, fall back to using TCP/IP
Expected delay onTCP/IP path • Main factors: • Round-Trip Time (RTT) – main Tprop • Prob. of packet loss on IP path, p, • Bottleneck link rate Throughput B(p): approximately reciprocal of expected delay • Other terms: • Wmax: receiver window size • b= 2 (ACK-every-other-segment) • T0: initial time-out J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCP Throughput: A Simple Model and its Empirical Validation,” Proc. of ACM SIGCOMM 98, Aug. 31 - Sep. 4, Vancouver Canada, pp. 303-314.
Input parameters plus the time to transfer a 1GB file and a 1TB file Loss Round- trip prop. delay Mean TCP delays Impact of propagation delay Low impact of bottleneck link rate in wide-area networks Impact of packet loss rate
Delays incurred in using an end-to-end circuit • Circuit setup delay + File transfer delay • msig: message length; rs: signaling link rate • Loads:rsig and rsp: sig. link and processor • Tsp: signaling protocol processing delay • k: number of switches; Tprop: r.t. prop. delay • f: file sizerc: circuit rate Acknowledgment: NSF ANI grant 0087487for hardware signaling
Should the application attempt a circuit setup or not? • Mean delay if a circuit setup is attempted Pb: call blocking probability in the circuit-switched network If circuit setup fails, fall back to Internet path
Numerical resultslink rate = 1Gbps Tprop= 0.1ms Tprop= 50ms
When rc = 100Mbps and Tprop= 0.1ms rc = 1Gbps, Tprop= 0.1ms Crossover file sizes When Tprop = 50ms, always attempt a circuit
Utilization considerations • Example: in 50ms scenario, if we transfer a 100KB file over a 100Mbps path, transfer time is only 8ms. Circuit utilization is 8/(50+8) = 13.7% • Two opposing factors • If the crossover file size (beyond which circuit setup is attempted) is increased • per-circuit utilization increases • traffic load decreases (Pareto distribution of file sizes), which means aggregate utilization decreases
For a 1% call blocking probability Pb = 0.01 r m ua 4 17 117 1 10 100 Aggregate utilization ua r: traffic load m: number of circuits Pb: call blocking probability Assuming file size follows Pareto distribution • Define fractional offered load 24.8% 58.2% 84.6%
Plot of utilization u withrc= 100Mbps, k=20 For 50ms paths, set a crossover file size When load is low, operate at a high blocking rate Pb=0.3 Pb=0.01
Research problems • What are the research problems? • What happens if the call gets blocked? • Bandwidth sharing modes • Low load performance • Scheduled vs. immediate-request • Long paths vs. short paths • Mismatch between multitasking end hosts and TDM circuits
Connection-oriented bandwidth sharing for file transfers • This is a new problem • Real-time (interactive) audio-video applications generate data at a certain rate (constant or variable) • implication: application requests the required bandwidth from the network, and answer is binary (accept or reject); multiple classes • File transfers: “any” bandwidth that the network can provide could be acceptable • implication: application requests a MAX bandwidth, but the answer can be multi-level
1 1 1 1 2 2 2 2 3 3 3 3 . . . . N N N N Each transfer is allocated C/N capacity Each transfer gets C/N capacity Fixing the bandwidth for the transfer could be a bad thing: low load problem • Varying bandwidth list scheduling algorithm • uses knowledge of file size to make varying bandwidth allocations for transfer • catch: requires circuit switches to be reprogrammed multiple times within lifetime of a transfer (circuit) Capacity C Capacity C Circuit Switch Packet Switch The lone remaining transfer enjoys full capacity C The lone remaining transfer continues with capacity allocation C/N
Scheduled vs. immediate-request calls • Session type requests: • long holding times (2 hours) • specific rate • remote visualizations • scientists participate in sessions • best served with an advance reservation • File transfer requests: • file sizes provided not holding times • max rate specified but any rate can be allocated • scientists not involved; just computers • Large files (e.g. 1 TB on 1 Gbps takes 2.2 hours) • should be handled in scheduled mode • should we allocate 10Gbps and finish in 800 sec? • immediate-request? or scheduled? • Small files (e.g. 1 GB on 1 Gbps takes 8 sec) • should be handled in immediate-request mode
Specific research activities • Scheduling • Design and compare algorithms for scheduling session-type and large file transfer requests • Use preemption and repositioning of large file transfer requests • Immediate-request call admission algorithms • Use Markov Decision Process (MDP) tools to balance fairness and overall throughput • Long-path and short-path calls • Large files (high-BW) and short files (low-BW) calls • Multi-level answer rather than binary accept/reject • Both with Fixed bandwidth and Varying bandwidth
Research problems • What are the research problems? • What happens if the call gets blocked? • Bandwidth sharing modes • Low load performance • Scheduled vs. immediate-request • Long paths vs. short paths • Mismatch between multitasking end hosts and TDM circuits
File transfer Matlab Network protocols Filesystem network card Circuit-switched network Mismatch between multitasking end hosts and TDM circuits • Variability in sender: • other processes (e.g. matlab) + disk access (disk head location) • Variability in receiver: if buffer not emptied out, data loss occurs File transfer Matlab user space Filesystem Network protocols kernel network card
Effects of mismatch in nature of circuits and nature of hosts • Choose a high circuit rate and receive buffer can overflow causing losses • impacts delay + utilization (retransmissions) • Choose a low circuit rate and delay can be high • If sending rate is not matched exactly with circuit rate • circuit lies idle; utilization impacted
Fixed Rate Transport Protocol (FRTP) • Set up a circuit at a carefully chosen rate • Send data at that rate • hard to meter out data at a fixed rate from a multitasking sender when that rate is high (Linux system time granularity: 10ms) • No changes of sending rate • i.e., no flow control or congestion control • Packet losses recovered through retransmissions • no timers needed, just negative ACKs • because of in-sequence delivery
Experimental results CIRCUIT RATE (Mbps) CIRCUIT UTILIZATION (%) RELATIVE TRANSFER DELAY 200 90 1.7 590 62 1.0
Current work • Experimenting with RT schedulers to schedule file transfer task in a set rhythm • Experimenting with file systems to characterize file write time to collect data to then determine circuit rate and receive buffer size
Back to the outline • Outline • Problem statement • CHEETAH: an NSF-funded experimental project • Research problems • The “internet” name in the title
Connection-oriented networks • Many flavors • Circuit switched • Time Division Multiplexed (SONET) • Equipment vendors: Sycamore, Ciena • Network: Cheetah, UltraScience Net, CA*net 4 • Wavelength Division Multiplexed (WDM) • Equipment vendors: Movaz, Calient, LambdaOptical • Network: Dragon, OMNInet, Internet2 HOPI
Connection-oriented networks • Many flavors • Packet switched • Multiprotocol Label Switching (MPLS) • Equipment vendors: Cisco, Juniper • Network: Internet2, ESnet • Virtual Local Area Network (VLAN) • Equipment vendors: Dell, Intel, Foundry, Extreme • Network: Enterprise local area networks • Just need to “enable” connection-oriented network through already deployed boxes
1, 150Mbps 1, 10Mbps 5, 100Mbps 1, 50Mbps 1, 500Mbps 2, 50Mbps 1, 50Mbps Bandwidth sharing problemin heterogeneous network Request for 30Mbps connection Switch granularity • Problem: • Tradeoff of fairness and utilization becomes more difficult when these crossconnect granularities are considered 1Mbps b 2,30Mbps 10Gbps f a d e c 51Mbps 1Mbps
Interconnecting these networks • Tricky business! • Involves many levels of interworking protocols • User (data) plane • Signaling protocols (for connection setup/release) • Routing protocols (for reachability, topology, loading data dissemination)
But • We need to solve this internetworking problem for a true connection-oriented service to flourish! Acknowledgment: DOE grant
Summary • Rich new set of research problems • Experimental challenges a plenty! • Real opportunity to deploy a network • Web site: http://cheetah.cs.virginia.edu