140 likes | 247 Views
T-Systems View on the Relationship of TE & Resilience. Planned contribution to Work Package 2 Deliverable D27 Introduction/Conclusion M. Düser, M. Jäger, R. Hülsermann, and F.-J. Westphal T-Systems International GmbH.
E N D
T-Systems View on theRelationship of TE & Resilience Planned contribution to Work Package 2 Deliverable D27 Introduction/Conclusion M. Düser, M. Jäger, R. Hülsermann, and F.-J. Westphal T-Systems International GmbH
Network-wide optimization of resources subject to QoS and resilience requirements Resilience: protection prior to failure Time constraints relaxed Network planning (mostly offline) • Adaptation/Re-optimization due to traffic variability or failures during operation • Resilience: Restoration following a failure • Not necessarily globally optimal! • Today: often offline, too • State-/time-/event-dependent • Time-critical (as fast as possible) TE during network operation (mostly online) Motivation • Traffic engineering (TE) and resilience are essential aspects of network operation • But: to be distinguished from network planning WP2
Demand characterization Quality of service objectives Traffic modeling Traffic measurement QoS requirements End-to-end QoS objectives Traffic forecasting Allocation to network components Control & dimensioning Traffic control Dimensioning feedback Monitoring Performance monitoring Detailed ITU-D view of TE Speed of the feedback loop determines overall reconfiguration speed WP2
Revised Definitions Traffic Engineering is the sum of all functions in a network that are applied to provide Quality of Service (QoS) whilst minimizing the required network resources/costs during network operation. Resilience is the ability of a network to prevent a service from being disrupted, or to resume an interrupted service following a failure, using a variety of mechanisms, which trade off the speed of service recovery versus the required additional resources/costs. (QoR) To keep in mind: • There is a customer‘s and a network operator‘s perspective! • Not all services may be resumed (e.g. best-effort) • To operators, resilience is part of TE (both carried out by the very same operators [today], control plane [future]) WP2
TE and resilience through overprovisioning? • Overprovisioning of bandwidth may be the simplest approach (e.g. Ch. Diot, Sprint, 2003) • Assumptions: • Per-flow monitoring impossible in the core • Overprovisioning scales (unlike complex TE algorithms) • Bandwidth overprovisioning is a result of planning cycles 18 months • Bandwidth overprovisioning required for resilience purposes rely on bandwidth surplus during failure-free operation WP2
TE and resilience through overprovisioning? • But the approach to simply overprovision fails to consider: • Need for explicit end-to-end QoS guarantees and SLAs • Choice: Overprovisioning vs. TE (but not both) • Operators require geographically restricted QoS (e.g. for locally congested links) • TE is particularly useful in the event of a failure when resources are scarce • Sprint: 50% failures recover >1minute • Overprovisioning is not necessarily the most cost-efficient solution • the customer needs to be made aware of the cost associated with bandwidth provisioning • Increasingly dynamic operation on L2/L1 to be considered WP2
Multi-Service Aspects Traffic Engineering • Per CoS metrics Quality of Service (QoS) • Maintaining CoS metrics: • Latency & jitter • Loss and loss variation • Service availability • Policing per CoS • Statistics-driven • Performance monitoring Resilience • Per CoS resilience Quality of Resilience (QoR) • Dedicated protection for high quality (e.g. voice) • Shared protection/pre-emption of best-effort • Event-driven (restoration) WP2
Multi-Layer Aspects Traffic Engineering • Today: TE per layer • L3: load balancing, DiffServ, RSVP • L2: ATM CoS, Eth CoS? • L1: SDH SLAs • Overlay, augmented, peer-to-peer model • Translation/adaptation of metrics from layer to layer • GMPLS will enable TE across layers • Load balancing • Identify suitable points for performance monitoring (not on all layers) Resilience • Today: resilience per layer • L4: re-transmission, load adjustment • L3: re-routing • L2: ATM • L1: SDH protection, LSP re-routing • Overlay, augmented, peer-to-peer model • Potential inter-layer conflicts (choice of timescales) need to be resolved • Complexity of algorithms • Shared information on path/ link/node/interface status WP2
Multi-Domain Aspects Traffic Engineering • Overlay, augmented, peer-to-peer model • e2e SLAs vs. per-domain SLAs QoS consistency? • Load balancing • Suitable points for performance monitoring • New applications: VPNs Resilience • Overlay, augmented, peer-to-peer model • e2e vs. per-domain resilience QoR consistency? • Determined by max. allowable reaction time/network diameter • Forwarding/sharing of resilience status information WP2
protection path Signaling required Multi-Domain Protection dedicated interconnection point 1.1) Multi-domain protection SNCP core regional regional 1.2) Multi-domain protection end-to-end with dual-homing core regional regional WP2
Multi-Domain Restoration 2.1) Multi-domain restoration per domain, dedicated interconnection point core regional regional 2.2) Multi-domain restoration, multiple interconnection points core regional regional WP2
Multi-Domain Mix of Restoration and Protection 3.1) Most complex situation Sparse mesh1+1 protected Dense mesh restoration Dense mesh restoration WP2
CP Physical layer Convergence Through Dynamic Operation L3 TE (load balancing) Multi-layer/-domain convergence Convergence of TE & resilience (dyn. operation, consider timescales) Physical layer WP2
Summary & Conclusions • TE and resilience are both part of network operation • Overprovisioning is simple, but is it sufficient? (not finally answered…) • TE and resilience adapt the network resources to changes in the traffic statistics • Paths of evolution: • Today: many tools/algorithms for TE, resilience operated offline (static) • Future: GMPLS will enable increasingly online (dynamic) network operation • VPN services Required tools/algorithms do converge • But: True convergence only if traffic variability on same timescale as protection (e.g. OBS/OPS, WP3) • But: time of operation restricts the algorithms to be used • Separate, per-layer TE&resilience gradually replaced by a single TE&resilience strategy per CoS across multiple layers WP2