380 likes | 494 Views
15-441 Computer Networking. Network Management. Introduction. We have spent a lot of time on network protocols This lecture is about network What come to your mind when you think of networks? Devices (switches, routers, repeaters) Links ( WiFi , Sonet , Ethernet, T1 etc )
E N D
15-441 Computer Networking Network Management
Introduction • We have spent a lot of time on network protocols • This lecture is about network • What come to your mind when you think of networks? • Devices (switches, routers, repeaters) • Links (WiFi, Sonet, Ethernet, T1 etc) • Interface cards • Topology
What Does a “Device” Look Like? Fan Tray Port Cards Fabric cards SCPs Switching Shelf Area Power Modules Fan and Filter Trays
Switching Shelf Components SCP Switching Fabric Port Cards
Switch Control Processor (SCP) RS-232 serial port NMI / RESET buttons Power LEDs Ethernet port NEXT / SELECT buttons Display LED System LEDs
Physical Slots 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 1 A 4 C 3 4 Fabric #1 Fabric #2 Fabric #3 Fabric #4 1 B 4 D SCP X SCP Y 1A/B 2A/B 3A/B 4A/B 3C/D 1C/D 2C/D 4C/D Logical Diagram of the Switch
Documentation Maybe you’ve asked, “How do you keep track of it all?”... Document, document, document…
Documentation Basics, such as documenting your switches... • What is each port connected to? • Can be simple text file with one line for every port in a switch: • health-switch1, port 1, Room 29 – Director’s office • health-switch1, port 2, Room 43 – Receptionist • health-switch1, port 3, Room 100 – Classroom • health-switch1, port 4, Room 105 – Professors Office • ….. • health-switch1, port 25, uplink to health-backbone • This information might be available to your network staff, help desk staff, via a wiki, software interface, etc. • Remember to label your ports!
Documentation: Labeling Nice…
Example Backbone Network Architecture Edge Switch Edge Switch Edge Router Edge Router Back- bone Router Back- bone Router Edge Switch Edge Switch ATM Edge Switch Edge Switch Back- bone Router Back- bone Router Edge Router Edge Router Edge Switch Edge Switch
Why Multiple Types of Devices? • Core routers are much more expensive than edge routers • A router port is much more expensive than a switch port • How to achieve the same network goal by minimizing the number of expensive devices? • Edge switches aggregate traffic to share edge router access port • Core switches reduce # of core router ports and still achieve a fully logically connected mesh • Edge routers hold less # number of routes than core routers
Management Network • A completely separate network from “production” network that provides a means of monitoring and controlling “production” network without using it. • A “backdoor” to all network devices • Serial Connections (T1’s) • Ethernet (Telnet directly to device) • Console (Telnet through MC router)
MC1 (3640) MC1 (3640) Dialup Modem UUNET Fairfax (FFX) WILPAK WCOM FRAME RELAY Management Network HUB3 HUB2 Hub Phone # S3 S2 2001 MT1 (7204) S1 HUB1
Network Management Example • A typical problem • people are complaining that Netflix performance was bad last night • Where do you begin? • Where is the problem? • What is the problem? • What is the solution? • You may have different perspectives depending on who you are • Netflix engineer • Comcast engineer • A user a home
Where to Start? • With proper management tools and procedures in place, you may already have the answer • Consider some possibilities • 1. What configuration changes were made overnight? • 2. Have you received a device fault notification indicating the issue? • 3. Have you detected a security breach? • 4. Has your performance baseline predicted this behavior on an increasingly congested network link?
What Do You Need? • An accurate database of your network’s topology, configuration, and performance • A solid understanding of the protocols and models used in communication between your management server and the managed devices • Methods and tools that allow you to interpret and act upon gathered information
FCAPS: Five Areas of Network Management • Fault management • Configuration management • Accounting management • Performance management • Security management
Fault Management • When a fault occurs • Determine “exactly” where the fault is • Isolate the rest of the network from the failure • Reconfigure or modify the network to minimize the impact of operation • Repair or replace the failed components
Configuration Management • Configuration management is concerned with • Initializing a network • Gracefully shutting down part or all of the network • Maintaining, adding, and updating the relationships among components and the status of components themselves during network operation
Accounting Management • Network managers track the use of network resources by end user or end-user class • An end user or group of end users may be abusing its access privileges and burdening the network at the expense of other users • End users may be making inefficient use of the network, and network manager can assist in changing procedures to improve performance • The network manager is easier to plan for network growth if end user activity is known in sufficient detail
Performance Management • What is the level of capacity utilization? • Is there excessive traffic? • Has throughput been reduced to unacceptable levels? • Are there bottlenecks? • Is response time increasing?
Security Management • Managing information protection, and access control facilities • Generating, distributing and storing encryption keys • Passwords, authorization or access control information must be maintained and distributed • Monitoring and controlling access to computer networks and to all or part of the network management information • SM involves with the collection, storage, and examination of audit records and security logs • the enabling and disabling of these logging facilities
Differences of Network Management from Network Control • Human operator as the user of the network management • Stable storage is the fundamental building blocks for network management • Configuration files • Log files or databases • What to measure and then log? • What granularity? • How much overhead?
Simple Network Management Protocol (SNMP) • A set of standards for network management • a protocol • a data base schema or structure specification • a set of data objects • throughput, pkt counts, errors, CPU load, temperature, .. • for multi-vender, interoperable network management • used across a broad spectrum of device types: end systems, bridges, switches, routers and telecommunications equipment • TCP/IP based • Hundreds of tools built on top of SNMP protocol
Network Management Systems (NMS) • NMS is a collection of tools for network monitoring and control • Designed to view the entire network as a unified architecture • addresses and labels assigned to each point • specific attributes of each element and link known to the system • Single operator interface with a powerful but user-friendly set of commands • a minimal amount of separate equipment (hardware/software) is necessary • NMS software resides in the host computers and communications processors (bridges, routers)
Network Monitoring • Course grain monitoring • Counters as aggregate statistics • # of packets on a link • # of bytes on a link • # errors on a link • Keep packet-level statistics • Used in SNMP • Fine grain monitoring • Exam (and potentially log) each packet and its timing • Challenge to control the overhead • Hard to store, transfer, and process every packet over the entire duration of network operation • Various techniques have been invented
Flow Monitoring • Flow monitoring (e.g., Cisco Netflow) • Statistics about groups of related packets (e.g., same IP/TCP headers and close in time) • Recording header information, counts, and time • More detail than SNMP, less overhead than every packet capture
Core Network Cisco Netflow • Basic output: “Flow record” • Most common version is v5 • Latest version is v10 (RFC 3917) • Current version (10) is being standardized in the IETF (template-based) • More flexible record format • Much easier to add new flow record types Approximately 1500 bytes 20-50 flow records Sent more frequently if traffic increases Collector (PC) Collection and Aggregation Silde Courtesy of Nick Feamster
Flow Record Contents Basic information about the flow… • Source and Destination, IP address and port • Packet and byte counts • Start and end times • ToS, TCP flags …plus, information related to routing • Next-hop IP address • Source and destination AS • Source and destination prefix Silde Courtesy of Nick Feamster
Sampled Netflow • Packet sampling before flow creation • 1-out-of-m sampling of individual packets (e.g., m=100) • Create of flow records over the sampled packets • Reducing overhead • Avoid per-packet overhead on (m-1)/m packets • Accuracy? • Missing many of the small flows
Sampled Netflow Sample packets at random, aggregate into flows FlowId Counter Flow = Packets with same patternSource and Destination Address and Ports Flow reports 1 2 1 1 1 6 1 3 1 1 1 1 6 1 3 1 1 1 1 6 1 3 1 1 Estimate: FSD, Entropy, Heavyhitters, Changes, SuperSpreaders ….
Hash-Based Flow Sampling Version IHL TOS Length Identification Flags Offset TTL Protocol Checksum Source IP address Destination IP address …… SourcePort DestinationPort Hash Packet header Flowid [0,Max] Flow memory (flow, counter #pkts) 3 1 Hash range [3,10] 6 1 Compute hash, log if in range 1 1 6 1 3 1 1 1 3 1 1 1 1 6 1 1 6 1 3 1 1 Pick flows at random; not biased by flow size Good for “communication” patterns
Sample and Hold Algorithm If flow is already logged update Sample packet with probability p If new flow create counter Flow memory (flow, #pkts) 1 2 1 3 4 6 1 1 1 6 1 3 1 1 1 1 6 1 3 1 1 1 1 6 1 3 1 1 Accurate counts of large flows Good for “volume” queries
What do network operators care about? Network Operations Center Applications 3 2 2 1 1 Respect resource constraints 2 1 High flow coverage Provide network-wide goals Low data mgmt overhead 1 2 flow = same src-dst, ports, proto flow report = flow + pkt/byte counters Flow reports
Not A Solved Problem • Routers cannot record every packet/flow • Constraints: CPU, Memory, Bandwidth • Resource constraints don’t go away! • Network demands scale even as routers become more powerful
Summary of Key Concepts • Two keywords in network management • Network: not just the protocols • Management: human being has goals to achieve • First step in network management • Modeling and documenting all details of the network • Key difference from network control • Files and databases are fundamental building blocks • Five key areas of network management • FCAPS • SNMP and Netflow are just starting points • Many challenges remain • Opex dominates Capex • More scientific/systematic approach needed