170 likes | 294 Views
Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu) IETF 70 Routing Research Group (RRG) Fred L. Templin fred.l.templin@boeing.com. MTU Determination Problem. End-to-End. Final Destination (EMTU_R=64KB). Tunnel. MTU=64KB. MTU=64KB. Original Source (MTU=64KB).
E N D
Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu) IETF 70 Routing Research Group (RRG) Fred L. Templin fred.l.templin@boeing.com
MTU Determination Problem End-to-End Final Destination (EMTU_R=64KB) Tunnel MTU=64KB MTU=64KB Original Source (MTU=64KB) MTU=9KB Edge Network Tunnel Far-End (EMTU_R=8KB) MTU=64KB MTU=4KB Tunnel Near-End MTU=?? MTU=2KB Edge Network Internet/Enterprise Network/MANET/etc.
Tunnel MTU Issues (1) • IPv4 path MTU discovery has limitations for tunnels: • ICMPv4 “packet too big” (PTB) messages dropped by middleboxes – result is undiagnosable black hole • PTB messages returned to the tunnel near-end (TNE) can’t be translated into PTBs to send back to the original source • PTB messages easily forged by off-path attackers • does not work in the presence of multi-MTU subnets, i.e., last-hop router cannot know the MTU of the tunnel far-end (TFE) • CHALLENGE: TNE CANNOT BLINDLY ADMIT BIG PACKETS INTO THE TUNNEL WITH DF=1
Tunnel MTU Issues (2) • Unmitigated IPv4 fragmentation is harmful: • Existing TNEs have no way of knowing the Effective MTU to Receive (EMTU_R) of the TFE • Existing TNEs have no way of knowing the reassembly timeout value used by the TFE • Slow-path processing in fragmenting middleboxes • TNE has no way of controlling NATs that rewrite ip_id • IP fragment misassociations at TFE can cause undetected data corruption • CHALLENGE: TNE CANNOT BLINDLY SEND BIG PACKETS INTO THE TUNNEL WITH DF=0
Goals • Robust support for packets of various sizes • Maximize Packet Delivery Ratio • Manage fragmentation if necessary • Avoid in-the-network fragmentation • Avoid reassembly misassociations at TFE • Coexist with end-to-end MTU determination • Support larger MTUs
Solution: SPRITE-MTU • UDP Echo service for tunnel MTU discovery • Soft state management to track tunnel parameters (per RFC2003) • Explicit Congestion Notification for robust operation over tunnels with small MTUs • Improves operating conditions for end-to-end path MTU determination (RFC4821) • RESULT: DISCOVERS TUNNEL MTU AND MINIMIZES NUMBER OF FRAGMENTS PER PACKET (PREFERABLY DOWN TO 1)
Relevant Elements of Normative Specifications • RFC2003 (IPv4-in-IPv4 Encapsulation) • Basic encapsulation/decapsulation specifications • Inner packet fragmentation when DF=0 and packet larger than the TFE’s EMTU_R • Setting of DF • Tunnel Soft State • Sending packet while also returning PTB • RFC4213 (IPv6-in-IPv4 Encapsulation) • Basic encapsulation/decapsulation specifications • Conceptual sending algorithm • “Configuration knob” threshold for determining when an outer packet is fragmentable
Configuration Knob for Fragmentable Outer Packets • Two purposes: 1) avoid TFE receive buffer overrun, 2) avoid/minimize fragmentation on the TNE->TFE path • Below threshold, admit packets into tunnel without returning PTBs (TFE may need to reassemble) • Above threshold, admit packet into tunnel and return PTB if packet is larger than cached MTU • Minimums are 1280bytes for IPv6 (MUST) and 576bytes for IPv4 (SHOULD) • May be set to larger values based on knowledge of: 1) TFE’s EMTU_R, 2) other encapsulations that may occur on the TNE->TFE path • Ideally, push configuration knob up to 1480 (or better yet 1500) – but not always possible
1280: safest option 1280 – ~1380: probably safe for most paths 1380 – 1480: safe only if little/no additional encaps 1480 – 1500: only safe if path has larger-than-1500 MTU and TFE has larger-than-minimum EMTU_R optimizing down to the byte level not always possible Setting the Configuration Knob (Assuming ENCAPS=20) 1280 …. 1500
Setting DF • Set DF=1 in all packets larger than threshold • Set DF=1 even if TNE fragments packet before sending into tunnel • MAY set DF=0 to increase PDR and avoid spurious PTBs, but if so must use pacing and/or soft state feedback to manage fragmentation
Sending Big Packets into Tunnel • If packet is no larger than the tunnel’s probed MTU (initially set to the configuration threshold) send packet into tunnel with DF=1 • If packet is larger, send packet into tunnel with DF=1 but also send PTB back to source • Sending packet increases PDR and also allows end-to-end MTU determination (RFC4821) to determine actual MTU • Sending PTB alerts RFC4821 nodes that there *may* be an MTU restriction
What if it Might be Fragmenting? • Institute pacing until pathMTU to TFE is probed • If probed size is no smaller than configuration threshold, relax pacing • If probed size is smaller than configuration threshold, or no probes returned, synchronize soft state with TFE • Worst case: fast links with small MTUs on TNE->TFE path (need to carefully monitor TFE’s reassembly)
Soft State Management Protocol • TNE creates soft state and sends initial sprite to TFE using TFE’s on-link link local address as destination • TNE is asking TFE to synchronize state • TFE sends reply using its current sprite address as source • no soft state created yet – avoid buffer attacks • TNE sends sprite using TFE’s current sprite address as destination • TFE creates soft state; begins monitoring received packets • TNE and TFE continuously exchange sprites while packets are actively using the tunnel
Sprite-mtu Checksum • “sprite-mtu checksum” sums every 10th byte of the packet using the Fletcher-16 algorithm • While synchronized, TNE includes trailing sprite-mtu checksum • TFE checks checksum and discards packet if checksum disagrees
Explicit Congestion Notification • TNE sets ECT(0) or ECT(1) codepoint in its sprites • When TFE detects incorrect sprite-mtu checksums, it begins setting CE codepoint in its sprite replys • TNE institutes pacing while receiving sprite replys with CE codepoint • TNE relaxes pacing when CE codepoint no longer set
Futures • IEEE 802.3as Frame Expansion • larger than 1500 MTUs for 802.3 links • may allow setting configuration threshold to > 1500 • Larger EMTU_Rs for tunnel endpoints (up to 2KB) • Gigabit Ethernet 9KB jumboframes • Widespread use of sprite-mtu • Widespread use of RFC4821
TODO • Some encapsulations dangerous with any level of outer fragmentation – e.g., Teredo (IPv6/UDP/IPv4) • NATs re-write ‘ip_id’ • ‘ip_id’ collisions when multiple nodes behind NAT talk to the same TFE • solution: “UDP Fragmentation for Teredo” (draft to be written) • Use ICMP echo request/reply as fallback if TFE does not implement sprite-mtu (is it worth it?)