Challenges and Solutions for OAM in Point-to-Multipoint MPLS

Challenges and Solutions for OAM in Point-to-Multipoint MPLS Adrian Farrel, Old Dog Consulting Ltd. Zafar Ali, Cisco Systems, Inc.

Outline • P2P LSP Ping • Extending P2P LSP Ping for P2MP LSPs • P2P Traceroute • Traceroute for P2MP LSP Trees • Pro-active Connection Verification • Summary

MPLS Echo Request FEC SA=Source Addr DA=Destination Addr DA= 127/8 DA= 127/8 DA= 127/8 SA SA SA FEC Header Header Header 32 50 19 FEC P2P LSP Ping: In a Nutshell LSP R4 R3 R2 R1 MPLS Echo Reply Use the same label stack as used by the LSP so that Echo Request flows in band of LSP under test. The IP header destination address field of the echo request is a 127/8 address so that it is not forwarded at the destination. Echo Reply has destination IP address/port copied from the Echo Request’s source address/port. It is returned as IP or MPLS traffic.

MPLS LSP Ping Essentials • MPLS LSP Ping messages are UDP-encapsulated • Echo Requests contain basic fields and TLVs • Most important TLV is the Target FEC Stack • Identifies the LSP being tested • LDP IPv4/6 prefix • RSVP IPv4/6 session and sender template • VPN IPv4/6 prefix • BGP labeled IPv4 prefix, etc.

MPLS Echo Request FEC SA=Source Addr DA=Destination Addr DA= 127/8 DA= 127/8 SA SA Header Header 50 19 FEC P2P LSP Ping: Detecting Broken LSP LSP Broken R4 R2 R3 R1 MPLS Echo Reply • Echo Request is addressed to 127/8 address so it is not forwarded if the LSP is broken. It is delivered locally and causes an Echo Response. • If LSP is broken on a link, error is detected through lack of response.

P2P LSP Ping: Detecting a Misrouted LSP LSP Ping is initiated from R1 through LSP1 LSP 1 R1 R2 R3 R4 LSP 2 Owing to an error on R2, all data intended for LSP 1 is switched into LSP 2. This includes the Echo Request. R4 examines the Target FEC Stack on the received Echo Request and recognizes that it is not the intended recipient. It sends an Echo Response with an error code.

P2MP LSP Ping: In a Nutshell R3 31 R2 • Reuses existing LSP Ping mechanisms • Echo Request messages follow the same data path that normal MPLS packets would traverse (including packet replication) • Main differences are in LSP Identification/ Target FEC Stack R1 5 52 R4 60 17 P2MP LSP MPLS echo-req R6 R5 + P2MP LSP Identifier 23 MPLS Echo Reply R7

Identifying P2MP LSPs • LSPs still identified using the Target FEC Stack • MPLS-TE LSPs identified by Session and Sender Template • Just like P2P, but fields have slightly different meanings • P2MP ID used instead of Destination Address • Mirrors the differences in RSVP-TE • Sub-Group ID is not used • Multicast LDP LSPs identified by multicast FEC • Root address and opaque value • Just as used in multicast LDP

Detecting a Broken P2MP LSP R3 R2 R1 • Echo Request is addressed to 127/8 address so it is not forwarded if the LSP is broken. It is delivered locally and causes an Echo Response. • If LSP is broken on a link, error is detected through lack of response. 5 R4 60 17 P2MP LSP MPLS echo-req R6 R5 + P2MP LSP Identifier 23 MPLS Echo Reply R7

Detecting a Misrouted P2MP LSP R3 31 R2 R1 • Owing to broken LFIB or incorrect replication at R2 the Echo Request reaches R8 • R8 recognizes that it is NOT an Egress for “Target (P2MP) FEC” in the Echo Request and sends an Echo Response with an error code 5 R4 66 R8 60 17 P2MP LSP MPLS echo-req R6 R5 + P2MP LSP Identifier 23 MPLS Echo Reply R7

P2MP LSP Ping: Issues and Challenges • If you send a Ping to the whole P2MP tree you will get an Echo Response from each leaf • The number of egresses (leaves) in a P2MP tree can be tens, hundreds, or even thousands • The initiator (the ingress) may become swamped • The network around the initiator may be swamped • UDP rate limiting is also recommended • Lost Echo Replies lead to false negatives • Other traffic may be adversely affected • Solutions • Ping a single target • Jitter the responses

Egress Filtering R3 31 R2 R1 • Echo Request is in-band so still reaches all egresses • Target egress is identified by P2MP Egress Identifier TLV • Only the target egress sends an Echo Response • Can target one egress or whole tree 5 52 R4 60 17 P2MP LSP MPLS echo-req R6 R5 + P2MP LSP Identifier and Egress Identifier 23 MPLS Echo Reply R7

Jittered Responses R3 31 R2 R1 • Optional Procedure initiated by Ingress • Jitter Range is specified by the Ingress in the Echo Jitter TLV in the Echo Request • Egress sends Echo Response at randomly selected time within Jitter Range interval 5 52 R4 60 17 P2MP LSP MPLS echo-req R6 R5 + P2MP LSP Identifier and Jitter Range TLV 23 MPLS Echo Reply R7

R3 R2 R4 R1 R5 TTL P2P LSP Traceroute • Traceroute is used for hop-by-hop fault localization as well as path tracing • MPLS Echo Requests are sent with increasing TTL to “probe” the LSP from upstream LSRs • Echo Request forwarded as normal if TTL > 1 • When TTL expires the Echo Request is passed to the control plane • Checks that it is indeed a transit LSR for this P2P MPLS LSP • Reply contains the label and interface for reaching the downstream router, in Downstream Mapping TLV TTL=1 TTL=1 TTL=1

3 2 1 TTL=1 TTL=1 TTL TTL=1 P2MP Traceroute: In a Nutshell B=1, E=0 R3 R1 • Similar to P2P Traceroute with the following differences • Echo Request replicated onto all branches with identical TTL • A branch node may need to identify more than one downstream interface and label • Helpful to identify branch and bud nodes • Branch Node is identified by B-flag • Bud Node is identified by E-flag R2 R4 B=0, E=0 MPLS Echo Request R5 MPLS Echo Reply w/ Downstream Mapping TLVs B=0, E=1 R6 Bud Node IP R8 R7

P2MP Traceroute : Challenges • Multiple downstream interfaces/labels • Multiple Downstream Mapping TLVs already allowed (ECMP) • Scalability worse than for simple LSP Ping • Note that in IP multicast the traceroute is from destination to source • This might not be viable in MPLS since the previous MPLS hop might not be on the IP path to the ingress • Response jittering still available • Egress filtering can be used • Does a transit node know that it is on the path to the target egress? • Need to correlate the echo responses at ingress (to identify branches in the P2MP tree) • New B and E flags identify branch and bud nodes • Downstream Mapping TLVs help • But correlating Echo Responses to construct the tree is still hard

TTL TTL=1 TTL=1 TTL=1 TTL=1 TTL=1 Egress Filtering MPLS Echo Request Egress Identifier R4 R3 R2 • Egress filtering is possible in P2MP RSVP-TE • An LSR only responds if it lies on the path of the P2MP LSP to the egress identified by the P2MP Egress Identifier TLV • Possible because RSVP-TE identifies the destinations • Egress filtering is NOT possible for multicast LDP • A transit LSR of a multicast LDP LSP is unable to determine whether it lies on the path to any one destination • Unfiltered (full tree) traceroute is possible for all LSPs MPLS Echo Reply w/ Downstream Mapping TLV R4 R1 R6 R5

2 1 3 TTL=1 TTL=1 TTL TTL=1 IP Correlating Traceroute Responses B=1, E=0, {R3, R4} R3 MPLS echo-req (All Egresses) R1 • Problem is that traceroute for the whole tree will return many responses to ingress • Hard to work out which LSP hops belong where in the tree • Solution has three components • Indicate branch/bud status using flags (mandatory) • Indicate outgoing interface and label in Downstream Mapping TLV (mandatory) • List the destinations reachable through each outgoing interface/label (optional and only for RSVP-TE) • Achieved using new Downstream Mapping Multipath Information R2 MPLS Echo Reply w/ Downstream Mapping TLVs R4 B=0, E=0, {R6} R5 R6 B=0, E=1, {R7, R8} R7 Bud Node R8

Connection Verification Probing • A new approach to the scalability problem • Particularly useful for pro-active fault detection • A new Connection Verification LSP Ping message is sent by the ingress • Each destination responds to say the CV process has been started • Each destination expects to receive a new CV message within a specific time period • Non-receipt causes the destination to raise an alarm • Local action and Echo Response message • Process can be enabled and disabled by ingress • Can also be applied to P2P LSPs

References • RFC 4687Operations and Management (OAM) Requirements for Point-to-Multipoint MPLS Networks • draft-ietf-mpls-p2mp-lsp-pingDetecting Data Plane Failures in Point-to-Multipoint Multiprotocol Label Switching (MPLS) - Extensions to LSP Ping (work in progress) • RFC 4379Detecting Multi-Protocol Label Switched (MPLS) Data Plane Failures [MPLS LSP Ping] • draft-ietf-mpls-rsvp-te-p2mpExtensions to RSVP-TE for Point to Multipoint TE LSPs (work in progress) • draft-ietf-mpls-ldp-p2mpLabel Distribution Protocol Extensions for Point-to-Multipoint and Multipoint-to-Multipoint Label Switched Paths (work in progress) • draft-swallow-mpls-mcast-cvConnectivity Verification for Multicast Label Switched Paths (work in progress)

Summary • LSP Ping and Traceroute function for P2MP MPLS LSPs builds on established P2P technology • Objective is to test LSPs periodically or in response to faults • Detect and isolate faults • Scalability is a big concern • LSP tree may have thousands of egresses • Jittered responses eases the issue of the ingress being swamped • Egress filtering allows targeting of a single egress • Not possible for traceroute of multicast LDP LSPs • Scalability and security requirements call for rate limiting, but that can lead to false negatives • New work on pro-active fault detection using Connection Verification message • Multipoint to multipoint LSPs not currently addressed

Questions? Adrian Farrel adrian@olddog.co.uk Zafar Ali zali@cisco.com

Challenges and Solutions for OAM in Point-to-Multipoint MPLS