340 likes | 480 Views
Μελέτη και υλοποίηση υποδομής SLAs για το Πανευρωπαϊκό δίκτυο GEANT-2. ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. Service Level Agreement. A Service Level Agreement (SLA) is a formal definition of the relationship that exists between a service provider and its customer.
E N D
Μελέτη και υλοποίηση υποδομής SLAs για τοΠανευρωπαϊκό δίκτυο GEANT-2 ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ
Service Level Agreement A Service Level Agreement (SLA) is a formal definition of the relationship that exists between a service provider and its customer. A SLA can be defined and used in the context of any industry, and is used to specify what the customer could expect from the provider, the obligations of the customer as well as the provider, performance, availability and security objectives of the service, as well as the procedures to be followed to ensure compliance with the SLA. Service level agreements are often used when corporations outsource functions considered outside the scope of their own core competencies to third party service providers.
Service Level Agreement A service level agreement would typically contain the following information: A description of the nature of service to be provided The expected performance level of the service, specifically its reliability and responsiveness The procedure for reporting problems with the service The time-frame for response and problem resolution The process for monitoring and reporting the service level The consequences for the service provider not meeting its obligations Escape clauses and constraints Not all of the components of a SLA may be present in all contracts, but a good SLA would provide an overview of the different items that can go wrong with the provided service, and attempt to cover those situations as part of the SLA agreement.
Network Performance Metrics • One-Way Delay (OWD) • Round-Trip Time (RTT) • Delay Variation or "Jitter" and RTT Variation • Packet Loss – Interfaces error and drops • Maximum Transfer Unit (MTU) • Path MTU • Link Utilization – IP bandwidth utilization and achievable TCP throughput.
One-Way Delay One – Way Delay per-hop One-Way Delay
Delay Variation (“Jitter”) Delay Variation • Describes the level of disturbance of packet arrival times • Comparison to “ideal” pattern IP Delay Variation Metric (IPDV) (RFC 3393) • Delays for equally-sized packets • Delay depends on packet size due to serialization delay • Critical for real time applications (audio/video) Caused by: • Queuing on routers (especially on CPU-based router architectures) • Collision avoidance (shared Ethernet) • Link-level retransmission (802.11 wireless LANs)
Round-Trip Time & PacketLoss Round-Trip Time (RTT) • A to B one way delays + B to A one way delays + time for B to respond to A. Packet Loss One-way Packet Loss Metric for IPPM (RFC 2680) Caused by: • Congestion : severe congestion overflows queues and leads to packet drops (gradually or burst). • Errors: corruption, packets modified in-transit (noisy lines etc.), checksum failure on receiving end.
Packet Reordering, MTU & Performance Packet Reordering Caused by: • Alternative routes • Router internal parallelism • Packet size Maximum Transfer Unit (MTU) Common MTUs: • 1500 bytes (Ethernet, 802.11 WLAN) • 4470 bytes (FDDI, common default for POS and serial links) • 9000 bytes (Internet2 and GÉANT convention, limit of some Gigabit Ethernet adapters) • 9180 bytes (ATM, SMDS) Path MTU • The MTU supported by a path • The minimum of MTUs of links along the path Performance • TCP / STCP applications might have performance impact • Real-time media applications experience more serious problems
Results Requirements • Be able to monitor the services deployed • IPv4/IPv6. • Multicast/unicast. • IP QoS. • VPN/point-to-point connections. • Emulate behavior close from the one from the application used. • Different tools used within each networks • Need to abstract the data from the type of measurement tools used through a well define interface. • Inter-operability between tools.
Measurement Tools • Traceroute-like Tools : • traceroute, MTR, PingPlotter, lft, tracepath, traceproto • Bandwidth Measurement Tools : • pchar, Iperf, bwctl, Netperf, RUDE/CRUDE, ttcp, NDT, DSL Reports • Active Measurement Boxes : • DFN/GEANT2 HADES (formerly IPPM) • RIPE TTM • RENATER QoSMetrics • Passive Measurement Tools : • SNMP Device Polling: MRTG, Cricket • NetFlow-based: flow-tools etc. • Packet Capture and Analysis Tools: tcpdump, Wireshark/Ethereal, jnettop
Measurement Tools • OWD, OWPL, IPDV, traceroute –DFN IPPM • IPv4, IPv6, IP QoS, on-demand. • But also RIPE TTM for IPv4 and IPv6. http://www-win.rrze.uni-erlangen.de/ippm/ • TCP/UDP throughput –I2 BWCTL/iperf • IPv4, IPv6, on-demand. http://abilene.internet2.edu/observatory/data-views.html • IP link utilization, link capacity, interface errors, interface drops – from existing DB. • IPv4, IPv6, (multicast?) • On-demand. • Netflow –under investigation • IPv4, IPv6. • Info (working document): http://monstera.man.poznan.pl/wiki/index.php/JRA1_D3.4_Flow_Monitoring • Packet capture tools – HW: 10Gbps DAG cards, SW: Scampi framework. • Info (working document): http://monstera.man.poznan.pl/wiki/index.php/Passive • FYI: Global Performance Measurement Points directory • Info: http://e2epi.internet2.edu/pipes/pmp/pmp-dir.html
PerfSONAR System • perfSONAR (Performance focused Service Oriented Network monitoring ARchitecture) system • Is a joint effort of GÉANT2-JRA1, Internet2, and ESnet • The solution is deployed and further elaborated in • European Research Backbone Géant • Connected European National Research and Education Networks • Internet2’s Abilene network • ESnet (Energy Sciences network in US) • RNP (Brasilian NREN) • Open source development also for other interested networks • Name reflects the choice of Service Oriented Architecture
PerfSONAR SystemThe Choice of Service Oriented Architecture • Reasons for Service Oriented Architecture in the middle layer (“Service Layer”): • Large task can be split into independent “services” • Can be developed separately • Easier to maintain afterwards • Services can be added/dropped at runtime • Flexibility of deployment (e.g. NREN may use GEANT Lookup Service to advertise services) • Different implementations possible (e.g. using different programming languages)
PerfSONAR SystemServices • Measurement Point Service (MP) • Measurement Archive Service (MA) • Lookup Service • Allows the client to discover the existing services and other LSservices. • Dynamic: services registration themselves to the LS and mention their capabilities, they can also leave or be removed if a service gets down. • Authentication Service (GN2-JRA5) • Authentication functionality for the framework • Users can have several role, the authorisation is done based on the user roles. • Trust relationship between networks • Transformation Service • Transform the data (aggregation, concatenation, correlation, translation, etc). • Topology Service • Make the network topology information available to the framework. • Find the closest MP, provide topology information for visualisation tools • Resource protection Service • Arbitrate the consumption of limited resources
BWCTL GENERAL CASE LOCAL BWCTLD UNAVAILABLE
BWCTLThroughput Measurement What Is It? A resource allocation and scheduling daemon for arbitration of iperf tests Bwctl controls the throughput tests by adding resource allocation and scheduling policy controls. Problem Statement Users want to verify available bandwidth from their site to another. Methodology: Verify available bandwidth from each endpoint to points in the middle to determine problem area. Implementation Applications • bwctlddaemon • bwctlclient Built upon protocol abstraction library • Supports one-off applications • Allows authentication/policy hooks to be incorporated
BWCTLThroughput Measurement • Metrics • Throughput (Mbps) • Parameters • "interval" thereportinterval (bwctloption -i) • "protocol" eitherudportcp, defaultistcp (bwctloption -u forudp) • "bufferSize" sizeofread/writebuffer (bwctloption -l) • "windowSize" sizeoftcpwindow / udp socket receivebuffer (bwctloption -w) • "duration" durationoftest, defaultis 10 seconds (bwctloption -t) • "bandwidth" limitsudp send rate (bwctloption -b) • "ToS" specifiesToSbit (bwctloption -S) • "login" ifauthenticationisneeded . "password" dito Methods • On-demand testing with php-based BWCTL-client (web-GUI)
BWCTL [ 15] local 147.102.13.77 port 5001 connected with 147.102.13.75 port 5001 [ ID] Interval Transfer Bandwidth [ 15] 0.0-10.0 sec 116957184 Bytes 93343208 bits/sec [ 15] MSS size 1448 bytes (MTU 1500 bytes, ethernet) bwctl: stop_exec: 3469448542.020009 [ 5] local 147.102.13.75 port 5001 connected with 147.102.13.77 port 5001 [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 116957184 Bytes 93538771 bits/sec [ 5] MSS size 1448 bytes (MTU 1500 bytes, ethernet) bwctl: stop_exec: 3469448541.018142
OWAMP What Is It? OWD or One-Way PING • A control protocol • A test protocol • A sample implementation of both Why the OWAMP protocol? • Find problems in the network • Congestion usually happens in one direction first… • Routing (asymmetric, or just changes) • SNMP polling intervals mask high queue levels that active probes can show • There have been many implementations to do One-Way delay over the years (Surveyor, Ripe…) • The problem has been interoperability. • http://www.ietf.org/internet-drafts/draft-ietf-ippmowdp-014.txt
OWAMP OWAMP Control protocol • Supports authentication and authorization • Used to configure tests • Endpoint controlled port numbers • Extremely configurable send schedule • Configurable packet sizes • Used to start/stop tests • Used to retrieve results • Provisions for dealing with partial session results OWAMP Test protocol • Packets can be “open”, “authenticated”, or “encrypted” Sample Implementation Applications • owampddaemon • owpingclient Built upon protocol abstraction library • Supports one-off applications • Allows authentication/policy hooks to be incorporated
OWAMP --- owping statistics from [dhcp-75.netmode.ece.ntua.gr]:59382 to [147.102.13.77]:35770 --- SID: 93660d4dcecb984136ad1d045d58ef75 first: 2009-12-10T17:54:42.364 last: 2009-12-10T17:54:52.998 100 sent, 0 lost (0.000%), 0 duplicates one-way delay min/median/max = -1.31/-1.2/-0.642 ms, (err=2.86 ms) one-way jitter = 0.1 ms (P95-P50) TTL not reported no reordering --- owping statistics from [147.102.13.77]:56641 to [dhcp-75.netmode.ece.ntua.gr]:51684 --- SID: 93660d4bcecb98413cf85be0ccbf222f first: 2009-12-10T17:54:42.386 last: 2009-12-10T17:54:53.041 100 sent, 0 lost (0.000%), 0 duplicates one-way delay min/median/max = 1.72/2.1/13.9 ms, (err=2.86 ms) one-way jitter = 6.3 ms (P95-P50) TTL not reported no reordering
Implementation Set Up of BWCTL and OWAMP daemons (bwctld and owampd) that run constantly in the background listening and accepting incoming measurement connections Scheduling with cron to conduct measurements using owping and bwctl to a specific target host every 5 minutes Measurement data are collected and stored in a RRD DataBase and in a MySQL DataBase Using Apache Tomcat a Graphical user is provided that exhibits the latest current measurement results and the ability to dynamically select the measurement date is offered We utilize the behavior prediction algorithm of RRDTool to predict future measurement behavior and to ensure SLA conformance