AAA Transport Issues

AAA Transport Issues. Draft-ietf-aaa-transport-00.txt http://www.drizzle.com/~aboba/AAA/AAA_transport.ppt Bernard Aboba Barney Wolff Dave Mitton.

  2. Outline • Goals and objectives • Introduction • AAA proxy bestiary • Congestion control principles • Summary

  3. Goals and Objectives • To understand how AAA protocols interact with the transport layer • To understand the transport behavior of AAA protocols running over UDP, TCP and SCTP • Useful to understand behavior of existing protocols (RADIUS, TACACS+) as well as DIAMETER • To understand the transport behavior of proxy systems • Transport layer proxies, Store & Forward proxies, Routing proxies, Re-direct proxies exhibit different transport behavior • Implications for other proxied protocols (SIP, DNS) as well as AAA

  4. Introduction • AAA protocol exchanges • Transport connection usage • Firewall issues • The “Mice” problem • Application driven vs. Network driven • Reliable versus unreliable transport traces

  5. AAA Protocol Exchanges • Single request/response • Simple authentication/authorization exchanges (NAS initiated) • Accounting exchanges (NAS initiated) • Multiple request/response • EAP exchanges (NAS initiated) • Unsolicited server messages • Request/response initiated by server

  6. Transport Connection Usage • Implementation experience (RADIUS) • Some implementations use a single socket for NAS-AAA server communication • Some implementations use a socket per port! • Implications • Possible for AAA to use congestion-friendly transport in a non-congestion-friendly way • Pipelining desirable • No need for NAS to wait for a Response before sending another Request • May make use of a single connection more palatable • Congestion Manager support desirable • Enables separate connections to share information with each other and with application layer

  7. Firewall Issues • Designing firewall for AAA server is not hard • Only allow DNS, AAA traffic to/from NASen to AAA server on AAA port • Typically don’t need layer 7 filtering • What about denial of service attacks? • RADIUS vulnerable to DoS attacks • Bogus client can send large number of Requests • Server must validate User-Password or CHAP-Password attribute, Message-Authenticator attribute if present • Strong crytography support increases DoS vulnerability • Solution • Need port-specific rate limiting on router or built-in TCP/SCTP DoS protection

  8. The “Mice” Problem • Many NASes (000s) can converse with a single AAA server or proxy • Traffic from a single NAS may be light, but traffic close to the server/proxy may be substantial • Result can be packet loss in router near server, or buffer overflow within the server itself • Data traffic can also compete with AAA traffic near the NAS

  9. The Proxy Congestion Problem 56 Kbps • Bottleneck may be between AAA proxy and a particular AAA server • DinkyLink.org AAA server has trans-oceanic 56 Kbps Internet connection • Proxy may be overloaded at the application layer (many NAS requests) • NAS can’t sense proxy-AAA server bottleneck since it has no transport layer connection to the AAA server • Result: NAS sends more requests than proxy can forward, proxy send buffer fills up • NAS can’t differentiate reasons for poor application-layer response experienced with the proxy • Result: NAS switches to another proxy inappropriately, re-transmits request at the application layer, etc. • Proxy needs a way to communicate status to the NAS (Unable to forward, No Response, Busy) 10 Mbps 10 Mbps 10 Mbps

  10. Application Driven Vs. Networking Driven • AAA protocol exchanges typically application driven • Definition: time between exchanges larger than RTT • Examples • 48 port NAS, session time of 20 minutes • Authentication & Accounting Request every 25 seconds • Traffic assuming 1500 octet packets: 480 bps • Total traffic, assuming 4Kbps per port: 192 Kbps • 2048 port NAS, session time of 10 minutes • Authentication & Accounting Request every 293 ms • Traffic assuming 1500 octet packets: 41 kbps • Total traffic, assuming 4Kbps per port: 8.2 Mbps • AAA exchanges can also be network driven • Employees come to work in the morning, logon to the network • NAS reboots, users logon again • After network partition, NAS sends stored accounting records

  11. Transport Parameter Validation Issues • CWND, RTT/RTONAS-proxy, RTT/RTOproxy-AAA server estimates not valid for application-driven scenarios typical of AAA • Multiple RTTs may elapse between packets • 48 port NAS ~ 200 RTT • 2048 port NAS ~ 2 RTT • CWND can open without being fully utilized • CWND validation • RFC 2581 recommends slow-start after an interval larger than the RTO • RFC 2861 recommends only increasing the congestion window only if it was full when the ACK arrived; congestion window reduced by half once per RTO • Ssthresh not reduced • Remaining issue: RTT/RTO validation

  12. Reliable Transport Protocol Trace AAA Server NAS { Auth Request • Notes: • 8 packets if same port used for auth and accounting, 9 otherwise • Server typically piggybacks ACK with Auth Response unless it’s really overloaded • ACK of Auth Response can’t piggyback on Accounting Start if Accounting uses a different port • Long delay between auths means that previous RTT/CWND estimates not valid between transactions • Long delay between Accounting Start/Stop means RTT/CWND typically no longer valid within a single transaction • Since Responses are ACK’d at the transport layer, an app layer Response ACK would not add additional packets due to piggybacking Auth Response/ACK { ACK Accounting Start Response/ACK Actng ACK Accounting Stop Response/ACK ACK

  13. Reliable Transport Protocol Trace (Routing Proxy) Routing Proxy NAS { Auth Request • Notes: • Proxy may send delayed ACK if home server is sufficiently far away (>100 ms RTTproxy-server) • 11 packets in worst case if same port used for auth and accounting, 12 otherwise ACK Auth Response/ACK { ACK Accounting Start ACK Response/ACK Actng ACK Accounting Stop ACK Response/ACK ACK

  14. UDP (RADIUS) Protocol Trace AAA Server/Routing Proxy NAS { Auth Request • Notes: • 6 packets in worst case • Retransmission behavior undefined (no RTT/RTO measurement) • Failover/failback behavior undefined • Transport doesn’t self clock hop-by-hop OR end-to-end • Accounting response represents a transport layer, not app layer ACK; (no error messages) Response Auth { Accounting Start Accounting Response Actng Accounting Stop Accounting Response

  15. AAA Proxy Bestiary • Routing proxies • Re-direct proxies • Store and Forward proxies • Transport layer proxies

  16. Routing Proxy (Auth Only) Routing Proxy NAS AAA Server Auth Request • Notes: • Routing proxy means that transport dynamics are hop-by-hop • Transport self-clocks hop-by-hop but not necessarily End-to-end • End-to-end transport dynamics depends on details of proxy buffer management (back pressure) • AAA server can often piggyback ACK with Response if it is not overloaded • Proxy may send delayed ACK to Auth Request if AAA server is sufficiently far away ACK Auth Request Response/ACK ACK Response/ACK ACK

  17. Store & Forward Proxy (Actng Only) S&F Proxy NAS Actng Server Actng. Start • Notes: • Store and Forward proxies only used for accounting • S&F proxy means that transport dynamics are completely hop-by-hop • No issues with end-to-end self clocking • Store and Forward proxy can often piggyback ACK with Response if it is not overloaded, since no forwarding need occur before responding • Store and Forward proxies are a bad idea since the NAS is fooled into believing that it has received an App layer ACK when this is not the case; NAS may delete accounting record from non-volatile storage. • If Store and Forward proxy stores accounting messages in memory or has moving parts while NAS does not, result can be lower reliability Response/ACK Actng. Start ACK Response/ACK ACK

  18. Re-Direct Proxy (Auth Only) Re-Direct NAS AAA Server Auth Request • Notes: • Redirect means that transport dynamics are end-to-end • Implication: TCP/SCTP transport self-clocks both hop-by-hop and end-to-end • Redirect proxy can typically piggyback ACK with Redirect if Redirect table kept in memory • Redirect ACK cannot be piggybacked with second Auth Request since they go to different destinations • AAA server can often piggyback ACK with Response if not overloaded • Since the Response will be ACK’d anyway, an application layer ACK of the Response will not add to the packet count. Redirect/ACK ACK Auth Request Response/ACK ACK

  19. Transport Layer Proxy (Auth Only) Transport Proxy NAS AAA Server Request Request • Notes: • NAS has separate transport connection for each realm, must know about realms • Several types of transport proxy; type shown is “transparent” • With “transparent” Transport layer proxy, transport layer dynamics are end-to-end • Result: End-to-end self-clocking • If AAA server sends piggyback’d Response/ACK, so will proxy (no proxy-originated delayed ACKs) • Resembles behavior of RADIUS proxies (minus the final ACK)! Response/ACK Response/ACK ACK ACK

  20. Congestion Control Principles • Conservation of packets • Failover and failback • Self-clocking

  21. “Conservation of Packets” • Once you’ve reached the end of the window, don’t send more packets until you have evidence that original packets are no longer transiting the network • Packets received by destination (ACK) • Packets lost (Timeout, triplicate ACK) • Self-clocking occurs when sending rate is limited to rate at which ACKs are received

  22. “Conservation of Packets” Applied to AAA:Failover/Failback Control Volume • Notes: • NAS should not re-transmit to Proxy 1 until RTONAS-proxy1 has elapsed, or triplicate ACKs received • NAS should not failover to Proxy 2 until nRTONAS-proxy1 has elapsed • NAS cannot handle failover from AAA server 1 to 2 because it does not estimate RTONAS-AAAserver1 • AAA proxy 1 should not failover to AAA Server 2 until nRTOproxy1-server1 has elapsed • Not easy to implement failover/failback with TCP

  23. Self-Clocking Source: V. Jacobson, “Congestion Avoidance and Control, ACM SIGCOMM ’88 Vol 18 No. 4, August 1988

  24. Self-Clocking w/Proxies Proxy AAA Server NAS Receive buffer Send buffer Send buffer Receive buffer Goal: NAS can’t advance window until it receives an application layer ACK from the AAA server Unless send and receive buffers are coupled, no self-clocking!

  25. Hop-by-Hop vs. End-to-End Self Clocking • Proxy systems consist of two transport connections • NAS-proxy transport connection • Proxy-server transport connection • TCP/SCTP provides hop-by-hop self-clocking • NAS will only advance the window as it receives ACKs from proxy • Proxy will only advance the window as it receives ACKs from the AAA Server • Only transport, re-direct proxy types guarantee end-to-end self-clocking • Transport proxies: splice together two hop-by-hop connections to simulate end-to-end transport dynamics • Re-direct proxies: connection is end-to-end after initial re-direct

  26. Hop-by-Hop vs. End-to-End (cont’d) • Hop-by-hop congestion avoidance does not prevent proxy congestion in other proxy types • Store & Forward proxies completely decouple the NAS-Proxy connection from the Proxy-Server connection (BAD!) • Routing proxies do not automatically propagate congestion signals between receive and send buffers • Micro level self-clocking not possible • Macro level coupling via “back pressure” requires multiple NAS-proxy connections for proper granularity • Uber-macro level: application-layer error messages • Conclusion • Transport dynamics with proxies at best equal to end-to-end case • TCP/SCTP transport not sufficient for end-to-end self-clocking with routing or store & forward proxies

  27. Solutions • Don’t use proxies • Can we ban S&F proxies altogether? • Use re-directs • Use transport proxies • Looks like a single transport connection with micro scale coupling (individual ACKs) • Requires extensive application/transport integration • Routing proxies • Application layer error messages • Simplest solution • Does this completely address the proxy congestion issue? • Macro scale coupling (window) between receive and send buffers (“backpressure”) • Only empty receive buffer as fast as send buffer empties • Requires separate connections/streams for each realm • Without individual connections/streams, no way to enable self-clocking on a per-path basis • Problem: with n connections, initial slow-start window is effectively n or 2n • Too complex to implement?

  28. Application Layer Error Messages • “Busy”: Proxy/Server too busy to handle additional requests, NAS should failover requests to another proxy/server • “Forwarding”: Proxy has located AAA server, but timely response is not forthcoming; NAS should wait for final response • “Can’t Locate”: Proxy can’t locate the AAA server for the indicated realm; NAS should reject access • “Failover”: Proxy has tried primary server, is failing over to secondary server; NAS should reset app layer timers, not attempt failover to secondary proxy • “Can’t Forward”: Proxy has tried both primary and secondary AAA servers with no response; NAS should reject access • “Processing”: Server cannot provide an immediate response to the request; NAS should wait for final response

  29. AAA Reliable Transport “Profile” • What is a transport profile? A recommendation on how to use transport within AAA • Efficiency • Persistent connections/pipelining • Nagle algorithm enabled • AAA packets often smaller than MSS • Useful for transport layer batching when packet spaced close together, but… • Typically no additional packets for NAS to send in response to AAA server/proxy ACK • CWND validation • RFC 2861 • With high inter-packet spacings, RTT measurements made so infrequently that network conditions may change between measurements • Don’t let CWND build as a result of (now) invalid measurements, decay it instead • Result: CWND=1 or 2 most of the time, AAA operates in perpetual “slow start” • Congestion Manager • Draft-ietf-ecm-cm-03.txt • Enables multiple AAA connections to share state with each other, and possibly with the application as well • May be helpful for failover/failback

  30. Preliminary Recommendations • TCP: Feasible • Recommended practice: Nagle algorithm enabled, Congestion window validation, Congestion Manager • More work needed on failover/failback • SCTP: Feasible • Recommended practice: Nagle algorithm enabled, Congestion window validation, Congestion Manager • Failover features, built-in support for multiple streams • More work needed on failback • UDP: More investigation needed • Only marginally fewer packets than TCP/SCTP, except where RTTproxy-server > delayed ACK timer • Probably can only offer simple windowing (CWND=1, 2) without heading down slipper slope • Would require per-realm RTT, RTO logic for failover/failback (congestion manager)

  31. Summary • TCP/SCTP feasible for use with AAA • More work needed on failover/failback • Which transport(s) should be mandatory? • Reliable Transport “profile” recommended • Persistent connections/pipelining • Nagle algorithm enabled • Congestion window validation • RTO validation • Congestion Manager • UDP transport needs more investigation • Proxies complicate analysis of AAA transport behavior • End-to-end congestion avoidance not guaranteed in proxy environments, even when reliable transport is utilized • Microscopic self-clocking difficult in routing proxies • Application layer error messages recommended • Use of re-direct proxies encouraged

