280 likes | 365 Views
draft-ietf-ecm-cm-01.txt. The Congestion Manager. Hari Balakrishnan Srinivasan Seshan MIT LCS CMU http://nms.lcs.mit.edu/. CM architecture. HTTP. RTP/RTCP. NNTP. Integrates congestion management across all applications (transport protocols & user-level apps)
E N D
draft-ietf-ecm-cm-01.txt The Congestion Manager Hari Balakrishnan Srinivasan Seshan MIT LCS CMU http://nms.lcs.mit.edu/
CM architecture . . . HTTP RTP/RTCP NNTP • Integrates congestion management across all applications (transport protocols & user-level apps) • Exposes API for application adaptation, accommodating ALF applications • This draft: sender-only module TCP1 TCP2 SCTP UDP API Congestion Manager IP 48th IETF (Pittsburgh) ECM WG
Outline • Draft overview (“tutorial” for slackers!) • Terminology • System components • Abstract CM API • Applications • Issues for discussion 48th IETF (Pittsburgh) ECM WG
Assumptions & terminology • Application: Any protocol that uses CM • Well-behaved application: Incorporates application-level receiver feedback, e.g., TCP (ACKs), RTP (RTCP RRs), … • Stream • Group of packets with five things in common [src_addr, src_port, dst_addr, dst_port, ip_proto] • Macroflow • Group of streams sharing same congestion control and scheduling algorithms (a “congestion group”) 48th IETF (Pittsburgh) ECM WG
Architectural components API to streams on macroflow • CM scope is per-macroflow; not on data path • Congestion controller algorithm MUST be TCP-friendly (see Floyd document) • Scheduler apportions bandwidth to streams CM Congestion controller Scheduler 48th IETF (Pittsburgh) ECM WG
Congestion Controller • One per macroflow • Addresses two issues: • WHEN can macroflow transmit? • HOW MUCH data can be transmitted? • Uses app notifications to manage state • cm_update() from streams • cm_notify() from IP output whenever packet sent • Standard API for scheduler interoperability • query(), notify(), update() • A large number of controllers are possible 48th IETF (Pittsburgh) ECM WG
Scheduler • One per macroflow • Addresses one issue: • WHICH stream on macroflow gets to transmit • Standard API for congestion controller interoperability • schedule(), query_share(), notify() • This does not presume any scheduler sophistication • A large number of schedulers are possible 48th IETF (Pittsburgh) ECM WG
Sharing • All streams on macroflow share congestion state • What should granularity of macroflow be? • [Discussed in November ‘99 IETF] • Default is all streams to given destination address • Grouping & ungrouping API allows this to be changed by an application program 48th IETF (Pittsburgh) ECM WG
Abstract CM API • State maintenance • Data transmission • Application notification • Querying • Sharing granularity 48th IETF (Pittsburgh) ECM WG
State maintenance • stream_info is platform-dependent data structure, containing:[src_addr, src_port, dst_addr, dst_port, ip_proto] • cm_open(stream_info) returns stream ID, sid • cm_close(sid) SHOULD be called at the end • cm_mtu(sid) gives path MTU for stream • Add call for sid--->stream_info (so non apps can query too) 48th IETF (Pittsburgh) ECM WG
Data transmission • Two API modes, neither of which buffers data • Accommodates ALF-oriented applications • Callback-based • Application controls WHAT to send at any point in time 48th IETF (Pittsburgh) ECM WG
Callback-based transmission Application 1. cm_request() 2. cmapp_send() /* callback */ CM • Useful for ALF applications • TCP too • On a callback, decide what to send (e.g., retransmission), independent of previous requests 48th IETF (Pittsburgh) ECM WG
Synchronous transmission • Applications that transmit off a (periodic) timer loop • Send callbacks wreck timing structure • Use a different callback • First, register rate and RTT thresholds • cm_setthresh() per stream • cmapp_update(newrate, newrtt, newrttdev) when values change • Application adjusts period, packet size, etc. 48th IETF (Pittsburgh) ECM WG
Application notification • Tell CM of successful transmissions and congestion • cm_update(sid, nrecd, nlost, lossmode, rtt) • nrecd, nsent since last cm_update call • lossmode specifies type of congestion as bit-vector: CM_PERSISTENT, CM_TRANSIENT, CM_ECN • Should we define more specifics? 48th IETF (Pittsburgh) ECM WG
Notification of transmission • cm_notify(stream_info, nsent) from IP output routine • Allows CM to estimate outstanding bytes • Each cmapp_send() grant has an expiration • max(RTT, CM_GRANT_TIME) • If app decides NOT to send on a grant, SHOULD call cm_notify(stream_info, 0) • CM congestion controller MUST be robust to broken or crashed apps that forget to do this 48th IETF (Pittsburgh) ECM WG
Querying • cm_query(sid, rate, srtt, rttdev) fills values • Note: CM may not maintain rttdev, so consider removing this? • Invalid or non-existent estimate signaled by negative value 48th IETF (Pittsburgh) ECM WG
Sharing granularity • cm_getmacroflow(sid) returns mflow identifier • cm_setmacroflow(mflow_id, sid) sets macroflow for a stream • If macroflowid is -1, new macroflow created • Iteration over flows allows grouping • Each call overrides previous mflow association • This API sets grouping, not sharing policy • Such policy is scheduler-dependent • Examples include proxy destinations,client prioritization, etc. 48th IETF (Pittsburgh) ECM WG
Example applications • TCP/CM • Like RFC 2140, TCP-INT, TCP sessions • Congestion-controlled UDP • Real-time streaming applications • Synchronous API, esp. for audio • HTTP server • Uses TCP/CM for concurrent connections • cm_query() to pick content formats 48th IETF (Pittsburgh) ECM WG
Linux implementation App stream Stream requests, updates cmapp_*() libcm.a User-level library; implements API System calls (e.g., ioctl) Control socket for callbacks UDP-CC TCP CM macroflows, kernel API Congestion controller Scheduler ip_output() ip_output() cm_notify() IP 48th IETF (Pittsburgh) ECM WG
Server performance CPU seconds for 200K pkts cmapp_send() Buffered UDP-CC TCP, no delack TCP/CM, no delack TCP/CM, w/ delack TCP, w/ delack Packet size (bytes) 48th IETF (Pittsburgh) ECM WG
Security issues • Incorrect reports of losses or congestion; absence of reports when there’s congestion • Malicious application can wreck other flows in macroflow • These are all examples of “NOT-well-behaved applications” • RFC 2140 has a list • Will be incorporated in next revision • Also, draft-ietf-ipsec-ecn-02.txt has relevant stuff 48th IETF (Pittsburgh) ECM WG
Issues for discussion • Prioritization to override cwnd limitation • cm_request(num_packets) • Request multiple transmissions in a single call • Reporting variances • Should all CM-to-app reports include a variance • Reporting congestion state • Should we try and define “persistent” congestion? • Sharing policy interface • Scheduler-dependent (many possibilities) 48th IETF (Pittsburgh) ECM WG
Overriding cwnd limitations • Prioritization • Suppose a TCP loses a packet due to congestion • Sender calls cm_update() • This causes CM to cut window • Now, outstanding exceeds cwnd • What happens to the retransmission? • Solution(?) • Add a priority parameter to cm_request() • At most one high-priority packet per RTT? 48th IETF (Pittsburgh) ECM WG
A more complex cm_request()? • Issue raised by Joe Touch • cm_request(num_packets) • Potential advantage: higher performance due to fewer protection-boundary crossings • Disadvantage: makes internals complicated • Observe that: • Particular implementations MAY batch together libcm-to-kernel calls, preserving simple app API • Benefits may be small (see graph) 48th IETF (Pittsburgh) ECM WG
Reporting variances • Some CM calls do not include variances, e.g., no rate-variance reported • There are many ways to calculate variances • These are perhaps better done by each application (e.g., by a TCP) • The CM does not need to maintain variances to do congestion control • In fact, our implementation of CM doesn’t even maintain rttdev... 48th IETF (Pittsburgh) ECM WG
Semantics of congestion reports • CM_PERSISTENT • Persistent congestion (e.g., TCP timeouts) • Causes CM to go back into slow start • CM_TRANSIENT: Transient congestion, e.g., three duplicate ACKs • CM_ECN: ECN echoed from receiver • Should we more precisely define when CM_PERSISTENT should be reported? • E.g., no feedback for an entire RTT (“window”) 48th IETF (Pittsburgh) ECM WG
Sharing policy • Sender talking to a proxy receiver • See, e.g., MUL-TCP • Client prioritization & differentiation • These are scheduler issues • Particular schedulers may provide interfaces for these and more • The scheduler interface specified here is intentionally simple and minimalist • Vern will talk more about the scheduler 48th IETF (Pittsburgh) ECM WG
Future Evolution • Support for non-well behaved applications • Likely use of separate headers • Policy interfaces for sharing • Handling QoS-enabled paths • E.g., delay- and loss-based divisions • Aging of congestion information for idle periods • Expanded sharing of congestion information • Within cluster and across macroflows 48th IETF (Pittsburgh) ECM WG