110 likes | 217 Views
Challenges and Chances in Network Reliability. Zhaobo Zhang Huawei Technologies (USA) 2014-09-11. Outline. Background of IP Network System Reliability Causes of unreliable network Potential Directions. Background. Fast growing computers/mobile device; ISP(regional, backbones ); IXP
E N D
Challenges and Chances in Network Reliability Zhaobo Zhang Huawei Technologies (USA) 2014-09-11
Outline • Background of IP Network • System Reliability • Causes of unreliable network • Potential Directions
Background • Fast growing • computers/mobile device; ISP(regional, backbones ); IXP • Primary source of information sharing & communication • Various applications • Data, voice, video conferencing, P2P • High demands • QoS, reliability, efficiency Hundreds Thousands Millions Billions
2010 Internet The Opte Project by Barrett Lyon Seek to make an accurate representation of the Internet using visual graphics.
Network System reliability • Metrics • Quality of service • connectivity, E2E delay, E2E packet loss rate • Network topology, service level agreement • Availability = MTBF/(MTBF+MTTR) • Mean Time Between Failure, Mean Time to Repair • e.g. 99.999%, means annual downtime 5.15 mins • Verification • Through fault insertion test and field data
Causes of unreliable network • IP connectivity errors • unstable transmission, overflow throughput, delay, network security threat, IP resource management • Network mis-configuration • network topology loop, non-optimal path, duplex mismatch, protocol unawareness • Software • version/patch conflict; Logic mis-configuration; device driver bugs, • Environment • Cable/fiber cut/device damage; electrical noise, power outage • Hardware: power/clock, logic aging, ram failure, soft error
Potential Direction 1 • Reliability-aware hardware design • Redundancy: RAM, link, NPU, board • Built in smart logic • Monitor misbehavior (e.g. delay increase), early alert • Monitor traffic, Balance traffic/heat to slow aging, auto-reroute to avoid defective logic. NPU NPU RAM NPU NPU RAM Smart Orange colors are spares
Potential Direction 2 • Data mining & automated process • Learn history data, provide guidance for current/next generation design, verification introduction, debug • Design spec • Verification list • Fault database • FIT result • FMEA • Field-return data • Field failure cases • Failure cases • Test & component stats
Potential Direction 3 Wikipedia: I know everything! Google: I have everything! Facebook: I know everybody Internet: Without me you all nothing! Electricity: keep talking bitches. • Big data, big network, big infrastructure, BIG power • Power consumption control • Low power design • Dynamic control: sleep mode, turn off SerDes, MAC • Thermal control • Heat is an enemy of devices • every 10 degrees Celsius of temperature rise, the speed of all chemical reactions doubles. 2% Global energy usage
Potential Direction 4 Application Layer • Fault tolerant control layer design/testing • SDN & open flow • Decouple network control and forwarding functions • Directly programmable network control • controller performs design validation as part of configuring the network and that design validation eliminates manual errors Business Application Business Application Business Application SDN Control Layer Network Service Network Service Network Service Infrastructure Layer