400 likes | 506 Views
Scaling routers: Where do we go from here? HPSR, Kobe, Japan May 28 th, 2002. Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu www.stanford.edu/~nickm. Router capacity x2.2/18 months. Moore’s law x2/18 m. Router capacity
E N D
Scaling routers: Where do we go from here? HPSR, Kobe, Japan May 28th, 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu www.stanford.edu/~nickm Nick McKeown
Router capacity x2.2/18 months Moore’s law x2/18 m Nick McKeown
Router capacity x2.2/18 months Moore’s law x2/18 m DRAM access rate x1.1/18 m Nick McKeown
Router vital statistics Cisco GSR 12416 Juniper M160 19” 19” Capacity: 160Gb/sPower: 4.2kW Capacity: 80Gb/sPower: 2.6kW 6ft 3ft 2ft 2.5ft Nick McKeown
Internet traffic x2/yr 5x Router capacity x2.2/18 months Nick McKeown
Fast (large) routers • Big POPs need big routers POP with large routers POP with smaller routers • Interfaces: Price >$200k, Power > 400W • About 50-60% of interfaces are used for interconnection within the POP. • Industry trend is towards large, single router per POP. Nick McKeown
Job of router architect • For a given set of features: Nick McKeown
Mind the gap • Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption. • Our options: • Make routers simple • Use more parallelism • Use more optics Nick McKeown
Mind the gap • Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption. • Our options: • Make routers simple • Use more parallelism • Use more optics Nick McKeown
Make routers simple • We tell our students that Internet routers are simple. All routers do is make a forwarding decision, update a header, then forward packets to the correct outgoing interface. • But I don’t understand them anymore. • List of required features is huge and still growing, • Software is complex and unreliable, • Hardware is complex and power-hungry. Nick McKeown
Router linecard OC192c linecard Lookup Tables Buffer & State Memory Optics Packet Processing Buffer Mgmt & Scheduling Physical Layer Framing & Maintenance Buffer Mgmt & Scheduling • 30M gates • 2.5Gbits of memory • 1m2 • $25k cost, $200k price. Buffer & State Memory Scheduler Nick McKeown
Things that slow routers down • 250ms of buffering • Requires off-chip memory, more board space, pins and power. • Multicast • Affects everything! • Complicates design, slows deployment. • Latency bounds • Limits pipelining. • Packet sequence • Limits parallelism. • Small internal cell size • Complicates arbitration. • DiffServ, IntServ, priorities, WFQ etc. • Others: IPv6, Drop policies, VPNs, ACLs, DOS traceback, measurement, statistics, … Nick McKeown
An example: Packet processing CPU Instructions per minimum length packet since 1996 Nick McKeown
Reducing complexityConclusion • Need aggressive reduction in complexity of routers. • Get rid of irrelevant requirements and irrational tests. • It is not clear who has the right incentive to make this happen. • Else, be prepared for core routers to be replaced by optical circuit switches. Nick McKeown
Mind the gap • Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption. • Our options: • Make routers simpler • Use more parallelism • Use more optics Nick McKeown
Use more parallelism • Parallel packet buffers • Parallel lookups • Parallel packet switches • Things that make parallelism hard: • Maintaining packet order, • Making throughput guarantees, • Making delay guarantees, • Latency requirements, • Multicast. Nick McKeown
Parallel Packet Switches Router 1 rate, R rate, R 1 1 2 rate, R rate, R N N k Bufferless Nick McKeown
Characteristics • Advantages • kh a memory bandwidth i • kh a lookup/classification rate i • kh a routing/classification table size I • With appropriate algorithms • Packets remain in order, • 100% throughput, • Delay guarantees (at least in theory). Nick McKeown
Mind the gap • Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption. • Our options: • Make routers simpler • Use more parallelism • Use more optics Nick McKeown
All-optical routers don’t make sense • A router is a packet-switch,and so requires • A switch fabric, • Per-packet address lookup, • Large buffers for times of congestion. • Packet processing/buffering infeasible with optics • A typical 10 Gb/s router linecard has 30 Mgates and 2.5 Gbits of memory. • Research Problem • How to optimize the architecture of a router that uses an optical switch fabric? Nick McKeown
100Tb/s optical routerStanford University Research Project • Collaboration • 4 Professors at Stanford (Mark Horowitz, Nick McKeown, David Miller and Olav Solgaard), and our groups. • Objective • To determine the best way to incorporate optics into routers. • Push technology hard to expose new issues. • Photonics, Electronics, System design • Motivating example: The design of a 100 Tb/s Internet router • Challenging but not impossible (~100x current commercial systems) • It identifies some interesting research problems Nick McKeown
100Tb/s optical router Optical Switch Electronic Linecard #1 Electronic Linecard #625 160- 320Gb/s 160- 320Gb/s 40Gb/s • Line termination • IP packet processing • Packet buffering • Line termination • IP packet processing • Packet buffering 40Gb/s 160Gb/s 40Gb/s Arbitration Request 40Gb/s Grant (100Tb/s = 625 * 160Gb/s) Nick McKeown
Research Problems • Linecard • Memory bottleneck: Address lookup and packet buffering. • Architecture • Arbitration: Computation complexity. • Switch Fabric • Optics: Fabric scalability and speed, • Electronics: Switch control and link electronics, • Packaging: Three surface problem. Nick McKeown
160Gb/s Linecard: Packet Buffering DRAM DRAM DRAM 160 Gb/s 160 Gb/s Queue Manager SRAM • Problem • Packet buffer needs density of DRAM (40 Gbits) and speed of SRAM (2ns per packet) • Solution • Hybrid solution uses on-chip SRAM and off-chip DRAM. • Identified optimal algorithms that minimize size of SRAM (12 Mbits). • Precisely emulates behavior of 40 Gbit, 2ns SRAM. klamath.stanford.edu/~nickm/papers/ieeehpsr2001.pdf Nick McKeown
The Arbitration Problem • A packet switch fabric is reconfigured for every packet transfer. • At 160Gb/s, a new IP packet can arrive every 2ns. • The configuration is picked to maximize throughput and not waste capacity. • Known algorithms are too slow. Nick McKeown
Approach • We know that a crossbar with VOQs, and uniform Bernoulli i.i.d. arrivals, gives 100% throughput for the following scheduling algorithms: • Pick a permutation uar from all permutations. • Pick a permutation uar from the set of size N in which each input-output pair (i,j) are connected exactly once in the set. • From the same set as above, repeatedly cycle through a fixed sequence of N different permutations. • Can we make non-uniform, bursty traffic uniform “enough” for the above to hold? Nick McKeown
1 1 1 N N N 2-Stage Switch External Inputs Internal Inputs External Outputs Spanning Set of Permutations Spanning Set of Permutations • Recently shown to have 100% throughput • Mild conditions: weakly mixing arrival processes C.S.Chang et al.: http://www.ee.nthu.edu.tw/~cschang/PartI.pdf Nick McKeown
1 1 N N 2-Stage Switch External Inputs Internal Inputs External Outputs Spanning Set of Permutations Spanning Set of Permutations 1 N Nick McKeown
2 1 1 2 1 1 1 N N N Problem: Unbounded Mis-sequencing External Inputs Internal Inputs External Outputs Spanning Set of Permutations Spanning Set of Permutations • Side-note: Mis-sequencing is maximized when arrivals are uniform. Nick McKeown
1 1 1 N N N Preventing Mis-sequencing Large Congestion Buffers Small Coordination Buffers & ‘FFF’ Algorithm Spanning Set of Permutations Spanning Set of Permutations • The Full Frames First algorithm: • Keep packets ordered and • Guarantees a delay bound within the optimum Infocom’02: klamath.stanford.edu/~nickm/papers/infocom02_two_stage.pdf Nick McKeown
ExampleOptical 2-stage Switch Linecards Lookup Phase 1 Buffer 1 Lookup Buffer 2 Phase 2 Lookup Buffer Idea: Use a single-stage twice 3 Nick McKeown
ExamplePassive Optical 2-Stage “Switch” R/N R/N Ingress Linecard 1 Midstage Linecard 1 Egress Linecard 1 R/N R/N Ingress Linecard 2 Midstage Linecard 2 Egress Linecard 2 Ingress Linecard n Midstage Linecard n Egress Linecard n R/N R/N It is helpful to think of it as spreading rather than switching. Nick McKeown
N N 2-Stage spreading Buffer stage 1 1 1 N Nick McKeown
1 1 2 2 n n Passive Optical Switching Integrated AWGR or diffraction grating based wavelength router Midstage Linecard 1 Egress Linecard 1 Ingress Linecard 1 1 1 1 1 Midstage Linecard 2 Egress Linecard 2 Ingress Linecard 2 2 2 2 2 Midstage Linecard n Egress Linecard n Ingress Linecard n n n n n Nick McKeown
100Tb/s Router Optical links Optical Switch Fabric Racks of 160Gb/s Linecards Nick McKeown
DRAM DRAM DRAM DRAM DRAM DRAM Queue Manager Queue Manager SRAM SRAM Lookup Lookup Racks with 160Gb/s linecards Nick McKeown
40 μm Additional Technologies • Demonstrated or in development • Chip to chip optical interconnects with total power dissipations of several mW. • Demonstration of wavelength division multiplexed chip interconnect. • Integrated laser modulators. • 8Gsample/s serial links. • Low-power variable power supply serial links. • Integrated array waveguide routers. Nick McKeown
Mind the gap • Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption. • Our options: • Make routers simpler • Use more parallelism • Use more optics Nick McKeown
Some predictions about core Internet routers • The need for more capacity for a given power and volume budget will mean: • Fewer functions in routers: • Little or no optimization for multicast, • Continued overprovisioning will lead to little or no support for QoS, DiffServ, …, • Fewer unnecessary requirements: • Mis-sequencing will be tolerated, • Latency requirements will be relaxed. • Less programmability in routers, and hence no network processors. • Greater use of optics to reduce power in switch. Nick McKeown
What I believe is most likely The need for capacity and reliability will mean: • Widespread replacement of core routers with transport switching based on circuits: • Circuit switches have proved simpler, more reliable, lower power, higher capacity and lower cost per Gb/s. Eventually, this is going to matter. • Internet will evolve to become edge routers interconnected by rich mesh of WDM circuit switches. Nick McKeown