810 likes | 985 Views
GINS The GARR Network Monitoring System. Agenda. PART 1 GINS description NOC Tools Motivation Required Functionality Monitoring Environment Statistics Examples Visualization Reports Slicing Traffic Flows Analysis Work in progress. PART 2 Let’s code the Network Monitoring!
E N D
GINSThe GARR Network Monitoring System Giovanni Cesaroni, GARR EUMEDCONNECT2 Training – Rome, 22-25 June 2009
Agenda PART 1 GINS description • NOC Tools Motivation • Required Functionality • Monitoring Environment • Statistics Examples • Visualization • Reports • Slicing • Traffic Flows Analysis • Work in progress PART 2 Let’s code the Network Monitoring! • SNMP in action • BGP, OSPF, MPLS, IPv6 PART 3 RRD World • RRD in action • How to avoid loosing data Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
GARR Network • 43 POPs (University and Research Centre) • PEERING: 76 Gbps • 52.5Gbpsvs GEANT2 • 10G + 2.5G IP Access • 3*10GE E2E links • 9*1GE E2E links • 3x2.5GbpsIP Transit • 2 Milan + 1 Rome • 7x1Gbps+10GbpsNational PEERING • BackBone Capacity ~110Gbps • 7 TLC Operators • Telecom Italia • Infracom (ex Autostrade TLC) • Fastweb • Interoute (ex Eurostrada) • WIND • BT-Italia (ex Albacom) • COLT-Telecom • 3 International IP Carrier • Global Crossing • Telia • Level3 • Access Capacity: ~60Gbps • Starting from 2M 10G • N.Access Links: 500 • N.Backbone Links: 62 • E2E Capacity: ~40Gbps • from 1G 10G Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
GOALS • Provide the NOC, Operations and Planning staff with all the tools needed to do their work as well as possible • Monitor users site connectivity • Check the status of the services at each level of the network • service oriented approach (not metric oriented) • Integrate monitoring services • Automate tools configuration • Give easy access to the information • Automatic generation of fault and performance reports The goal is not to manage the control plane, but to have full control of the network Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
GARR Network GARR NOC Measurements Storage(MySQL & RRD) Consistency Tools Robots GINS Architecture GINS Monitoring Tools GINS Visualization Tools GARR-DB: Network Database(Network Structure MySql) Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
GARR-DB: the Information System Aggregate Logical “circuit” (IP link,MPLS LSP, lambda service, etc) physical object User Site segments physical circuit physical circuit GARR Backbone eq physical objects GARR Domain administrative and technical information!!! Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
SW tools used by GINS Scheduler: Cron Reports: PHP, Jpgraph, HTMLDOC Data visualization: PHP, HTML, Javascript, Ajax, SVG Data storage: MySQL, File, RRD Data management: AWK, Bash, PHP, RRDtools ~5500RRD files Data acquisition: MRTG, SNMP polls, ping Network Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
NOC in action Alarms APM Trouble Ticket TLC NOC GARR NOC GARR Backbone End Site Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
GINS at a glance Main functionalities • Network monitoring • Statistics acquisition • Trouble Ticket System • Fault and Performance Reports • Monitoring Services • Lambda • SDH/SONET • MPLS • IPv4, IPv6 • OSPF, BGP • E2E • Multicast Beacons • Equipment • Statistics Services • IPv4, IPv6, Multicast traffic • Physical interface errors • Routers CPU • Premium IP • SDH/SONET errors • Backbone weathermap • Uncompressed Statistics Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Monitoring services • GINS detects/defines the status of different services, on the basis of the information gathered through the network. Monitoring is supported on the following service classes: • IPv4 and IPv6: [service status, input errors and output drops on physical interfaces] • end-user site • backbone interface • IP Multicast Beacons[service status] • Routing protocols: • OSPF [link costs] • BGP [peering status, adv/rec routes] • SDH/Sonet[SDH/Sonet errors] • router interface on leased-lines • Lambda[service status, optical equipment port status] • MPLS[MPLS LSP status] • E2E: [E2E service status] • defined as the stitching of multiple intra-domain and inter-domain links Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Statistics services • GINS stores performance measurements data and provides: • Traffic Statistics • IPv4 and IPv6, Multicast for end user sites and backbone • Aggregate • Peering • Premium IP • Uncompressed Statistics • Sonet/SDHerrors on leased lines • Router CPUload and temperature Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Other services • GINS includes a Trouble Ticket System which is highly customized for the GARR operations procedures. In particular, it manages user services, leased lines and PoP ticket. • Fault and performance reports: • User monthly and yearly reports (HTML and PDF) • User fault report and circuit availability • Uncompressed traffic statistics (IP BW usage, 95th percentile, etc.) • Carrier fault report and circuit availability (HTML and PDF) • Monitored physical devices: • Juniper J6350, M7i, M10, M20, M320 • Cisco: 12xxx, 17xx, 18xx, 2xxx, 3750, 72xx, 75xx • ADVA FSP3000 • Metrobility R4000, R5000 Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Who is the target user of monitoring UIs? The NOC & the Operation Staff, private access Monitoring Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Control Panel and IP Monitoring BGP Alarms & Monitoring • E2E Monitoring, Lambda & MPLS • Other Services Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Monitor Control Panel Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
NOC Interface (1/2) : links status Last action Trouble ticket Telnet Traffic in/out End Site Info Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
NOC Interface (2/2): other services and quick ticket management Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
End Site Info Trouble Tickets Traffic Interface Errors Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Physical Interface Input Errors and Output Drops 2Mbps The link is going to be upgraded to a Gbps link in the next days! Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
E2E Monitoring Status of the “domain segment” Status of the Interdomain Link Aggregate status of the “domain link” Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
E2E Stitching Monitoring IP MPLS LSP 10GE Lambda Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
GINS vs Gn2 E2E CU E2Emon Switch & DFN GARR NOC GN2 E2E CU GINS data aggregation E2Emon XML schema GARR archive GN2:JRA4 Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
MPLS Monitoring • MUPBED: one e2e connection Informations on: 1- LSP1 2- L2 connection GINS MPLS Service TLAB GN2 GN2IT TO GN2DFN LSP2 GARR SNMP Polls LSP1 LSP3 MI1 DFN FF MI2 TSystem Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
MPLS Monitoring: MUPBED case LSP Status E2E L2 inter-domain status Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Peer status & prefixes information Alarms BGP monitoring ... ... Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
SONET Alarms (rfc2558) Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Statistics • Common statistics sets, different type of representation • Online Network Status • Other Services Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Long Term Analysis Traffic, Input errors & output drops CPU load & temperature Router aggregate traffic & peaks Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Example of temperature statistics In such cases I’d like to be alerted by email, SMS, phone and voice!!! Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
The backbone weathermap Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Ticket info 25 20 615M OSPF cost Router CPU temperature Traffic load Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Ticket info Traffic load Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
How it works Weathermap Merge HTML dynamic map SVG image Generate Convert PNG image Network Measurements Storage Network Database Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
1- Network users, end sites • fault and availability reports of the services • historical traffic data • Who is the target user for network reports? • What kind of reports are provided? Fault & Performance Reports Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Fault & Performance Reports: UI monthly report 95th percentile Uncompressed statistics GARR User Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
User monthly and yearly PDF Reports Introduction Faults and availability Monthly and yearly traffic statistics ~1,000 report pages per month ~50MB disk space per month Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Uncompressed Traffic Statistics, monthly view 95th percentile 5 minutes Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Uncompressed Traffic Statistics, yearly view Monthly values Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Historical data 2005!! Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
2- Network planning staff • to extrapolate the traffic trends for the future network planning Fault & Performance Reports • Who is the target user for network reports? • What kind of reports are provided? • 1- Network users, end sites • fault and availability reports of the services • historical traffic data Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
GARR Traffic Trends 30.67 Gbps 3.84 Gbps Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Traffic Evolution GLOBAL INTERNET r ~ 1.4/y NATIONAL INTERNET r ~1.6/y E2E RESEARCH TRAFFIC r ~2.0/y Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Latency Measurements http://oss.oetiker.ch/smokeping/ By Tobias Oetiker Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Latency Measurements • Round Trip Time fluctuations • Packet Loss pecentage Fping probe End Site Server Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Slices GARR-DB: Network Database Description of the infrastructure • Temporary infrastructures Homer’s dream is just: • Network Labs • Temporary research projects • Infrastructures requiring monitoring only • Dedicated monitoring systems (users or projects) Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Slices Dedicated monitoring systems • Administrator requirements: • Easy to manage • Replicable • User requirements: • Quick and easy setup • Traffic statistics • Weathermaps • Alarms Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Slices • Slice link, description and status • MRTG log status • Access policy • Url • Slice status (on,off) • Status of MRTG CFG generation (red if disabled) • Cronjob status (red if disabled) Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Slices Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Traffic Flows Analysis Suite Nfsen/Nfdump by Peter Haag Based on NetFlow protocol Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009
Traffic Flows Analysis, architecture overview www Nfsen Nfdump RRDs User Nfdump (CLI) NetFlow, data export, sampling Nfcapd Network Raw data • Daily numbers: • ~2000 flows/s export • sampling 1:1000 • ~40MB-1.6GB each router (raw data) Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009