600 likes | 630 Views
TCP. EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev, Prayag Narula http://inst.eecs.berkeley.edu/~ee122/
E N D
TCP EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev, Prayag Narula http://inst.eecs.berkeley.edu/~ee122/ Materials with thanks to Jennifer Rexford, Ion Stoica, Vern Paxsonand other colleagues at Princeton and UC Berkeley
Today’s Lecture • Review some basic concepts from routing lectures • Lots of details in previous lectures • Today just focus on a few key points • Basic overview of TCP • Service model and header structure • Segments and sequence numbers • Setting up and tearing down connections • Timers and retransmissions • Many details ignored • Congestion control next lecture
Review of BGP: Simplified Version • If domains A and B have an interdomain link: • Their border routers announce routes to each other • One route for each reachable prefix • Routes announced whenever changed or withdrawn • Route withdrawn when domain no longer offering path to prefix • It usually has a path itself, but is choosing to not export that path • Policies: • Import policy: which routes the domain will use • Chooses among routes advertised by neighbors • Export policy: which routes the domain lets other use • Purely a filtering policy • Domain can only advertise routes it imported
Review of DVMRP: Simplified Version • Starts by broadcasting along reverse path tree • Tree formed by paths from members to source • Why is this a tree? • Prune tree, to avoid sending wasted messages • Leaf networks start by issuing NMR • Non-Membership Report • If all of a router’s children send NMRs, and it has no local members, then it sends an NMR to its parent • This builds source-specific trees: • Packets from source S to group member m follow same path that packets from m take to reach S (in reverse)
Constructing Source-Specific Tree source • Individual paths from members to source • Union of these paths form tree • Data packets sent in opposite direction down tree M1 M2 M3
Review of CBT: Simplified Version • Picks core (root, center, whatever) for each group • Member sends join message towards core • This builds a shared spanning tree • Later joins are “grafted” onto existing tree • Packets are delivered over this tree using standard flooding over spanning tree • Send out on all but incoming interface
Building and Using Shared Tree • Group members send joins to core • Joins are grafted on to tree • M1 sends data to group core M1 M2 M3 control (join) messages data
Review of Fair Sharing • Given a set of bandwidth demands ri and a total bandwidth C, the max-min bandwidth allocations are: ai = min(f, ri) • where f is the unique value such that Sum(ai) = C • Property: • If you don’t get full demand, no one gets more than you
TCP Service Model • Reliable, in-order, duplex byte-stream delivery • Hopefully with good performance • Challenges - the network can • drop packets • Even perhaps a large number • delay packets • Even perhaps for many seconds • deliver packets out-of-order • Follows from possibility of arbitrary delay • replicate packets • Weird, but it does sometimes happen • corrupt packets
TCP Support for Reliable Delivery • Checksum • Used to detect corrupted data at the receiver • …leading the receiver to drop the packet • Sequence numbers • Used to detect missing data • ... and for putting the data back in order • Retransmission • Sender retransmits lost or corrupted data • Timeout based on estimates of round-trip time • Fast retransmit algorithm for rapid retransmission
TCP Header Source port Destination port Sequence number Acknowledgment Advertised window HdrLen Flags 0 Checksum Urgent pointer Options (variable) Data
TCP Header Source port Destination port These should be familiar Sequence number Acknowledgment Advertised window HdrLen Flags 0 Checksum Urgent pointer Options (variable) Data
TCP Header Source port Destination port Starting sequence number (byte offset) of data carried in thissegment Sequence number Acknowledgment Advertised window HdrLen Flags 0 Checksum Urgent pointer Options (variable) Data
TCP Header Acknowledgment gives seq # just beyond highest seq. received in order. “What’s Next” If sender sends N in-order bytes starting at seq S then ack for it will be S+N. Source port Destination port Sequence number Acknowledgment Advertised window HdrLen Flags 0 Checksum Urgent pointer Options (variable) Data
ACKing and Sequence Numbers • Sender sends packet • Data starts with sequence number X • Packet contains B bytes • X, X+1, X+2, ….X+B-1 • Upon receipt of packet, receiver sends an ACK • If all data prior to X already received: • ACK acknowledges X+B (because that is next expected byte) • If highest byte already received is some smaller value Y • ACK acknowledges Y+1 • Even if this has been ACKed before
TCP Header Source port Destination port Sequence number Buffer space available for receiving data. Used for TCP’s sliding window. Interpreted as offset beyond Acknowledgment field’s value. Acknowledgment Advertised window HdrLen Flags 0 Checksum Urgent pointer Options (variable) Data
Flow Control • Advertised Window: W • Can send W bytes beyond the next expected byte • Receiver uses W to prevent sender from overflowing buffer
Window Sliding Window • Allow a given amount of data “in flight” Sending process Receiving process TCP TCP Last byte read Last byte written Next byte needed Last byte ACKed Last byte received Last byte can send
Advertised Window Limits Rate • If the window is W, then sender can send no faster than W/RTT bytes/sec • Receiver implicitly limits sender to rate that receiver can sustain • If sender is going too fast, window advertisements get smaller & smaller • In original TCP design, that was the sole protocol mechanism controlling sender’s rate • What’s missing? • Will cover that next time….
TCP Header Source port Destination port Sequence number Number of 4-byte words in TCP header;5 = no options Acknowledgment Advertised window HdrLen Flags 0 Checksum Urgent pointer Options (variable) Data
TCP Header Source port Destination port Sequence number “Must Be Zero”6 bits reserved Acknowledgment Advertised window HdrLen Flags 0 Checksum Urgent pointer Options (variable) Data
TCP Header Source port Destination port Sequence number We will get to these shortly Acknowledgment Advertised window HdrLen Flags 0 Checksum Urgent pointer Options (variable) Data
TCP Header Source port Destination port Sequence number Used with URG flag to indicate urgent data (not discussed further) Acknowledgment Advertised window HdrLen Flags 0 Checksum Urgent pointer Options (variable) Data
TCP “Stream of Bytes” Service Host A Byte 0 Byte 1 Byte 2 Byte 3 Byte 80 Host B Byte 0 Byte 1 Byte 2 Byte 3 Byte 80
… Provided Using TCP “Segments” Host A Byte 0 Byte 1 Byte 2 Byte 3 Byte 80 • Segment sent when: • Segment full (Max Segment Size), • Not full, but times out, or • “Pushed” by application. TCP Data TCP Data Host B Byte 0 Byte 1 Byte 2 Byte 3 Byte 80
TCP Segment • IP packet • No bigger than Maximum Transmission Unit (MTU) • E.g., up to 1,500 bytes on an Ethernet • TCP packet • IP packet with a TCP header and data inside • TCP header 20 bytes long • TCP segment • No more than Maximum Segment Size (MSS) bytes • E.g., up to 1460 consecutive bytes from the stream • MSS = MTU – (IP header) – (TCP header) IP Data IP Hdr TCP Data (segment) TCP Hdr
Sequence Numbers Host A ISN (initial sequence number) Sequence number = 1st byte TCP HDR TCP Data ACK sequence number = next expected byte TCP HDR TCP Data Host B
Initial Sequence Number (ISN) • Sequence number for the very first byte • E.g., Why not just use ISN = 0? • Practical issue • IP addresses and port #s uniquely identify a connection • Eventually, though, these port #s do get used again • … small chance an old packet is still in flight • … and might be associated with new connection • TCP therefore requires changing ISN • Set from 32-bit clock that ticks every 4 microseconds • … only wraps around once every 4.55 hours • To establish a connection, hosts exchange ISNs
SYN SYN ACK ACK Data Data Establishing a TCP Connection B • Three-way handshake to establish connection • Host A sends a SYN (open; “synchronize sequence numbers”) to host B • Host B returns a SYN acknowledgment (SYN ACK) • Host A sends anACK to acknowledge the SYN ACK A Each host tells its ISN to the other host.
TCP Header Source port Destination port Sequence number Flags: SYN ACK FIN RST PSH URG Acknowledgment Advertised window HdrLen Flags 0 Checksum Urgent pointer Options (variable) Data See /usr/include/netinet/tcp.h on Unix Systems
Step 1: A’s Initial SYN Packet A’s port B’s port A’s Initial Sequence Number Flags: SYN ACK FIN RST PSH URG (Irrelevant since ACK not set) Advertised window 5=20B Flags 0 Checksum Urgent pointer Options (variable) A tells B it wants to open a connection…
Step 2: B’s SYN-ACK Packet B’s port A’s port B’s Initial Sequence Number Flags: SYN ACK FIN RST PSH URG ACK = A’s ISN plus 1 Advertised window 20B Flags 0 Checksum Urgent pointer Options (variable) B tells A it accepts, and is ready to hear the next byte… … upon receiving this packet, A can start sending data
Step 3: A’s ACK of the SYN-ACK A’s port B’s port A’s Initial Sequence Number Flags: SYN ACK FIN RST PSH URG B’s ISN plus 1 Advertised window 20B Flags 0 Checksum Urgent pointer Options (variable) A tells B it’s likewise okay to start sending … upon receiving this packet, B can start sending data
SYN, SeqNum = x SYN + ACK, SeqNum = y, Ack = x + 1 ACK, Ack = y + 1 Timing Diagram: 3-Way Handshaking PassiveOpen ActiveOpen Server listen() Client (initiator) connect() accept()
What if the SYN Packet Gets Lost? • Suppose the SYN packet gets lost • Packet is lost inside the network, or: • Server discards the packet (e.g., listen queue is full) • Eventually, no SYN-ACK arrives • Sender sets a timer and waits for the SYN-ACK • … and retransmits the SYN if needed • How should the TCP sender set the timer? • Sender has no idea how far away the receiver is • Hard to guess a reasonable length of time to wait • SHOULD (RFCs 1122 & 2988) use default of 3 seconds • Other implementations instead use 6 seconds
SYN Loss and Web Downloads • User clicks on a hypertext link • Browser creates a socket and does a “connect” • The “connect” triggers the OS to transmit a SYN • If the SYN is lost… • 3-6 seconds of delay: can be very long • User may become impatient • … and click the hyperlink again, or click “reload” • User triggers an “abort” of the “connect” • Browser creates a new socket and another “connect” • Essentially, forces a faster send of a new SYN packet! • Sometimes very effective, and the page comes quickly
5 Minute Break Questions Before We Proceed?
Announcements • Mini-lecture next Monday by Igor on: A Quick Review of Networking Libraries • Homework 3b is out
ACK ACK FIN FIN Connectionnow closed Connectionnow half-closed Timeout: Avoid reincarnation B will retransmit FIN if ACK is lost Normal Termination, One Side At A Time • Finish (FIN) to close and receive remaining bytes • FIN occupies one octet in the sequence space • Other host ack’s the octet to confirm • Closes A’s side of the connection, but not B’s • Until B likewise sends a FIN • Which A then acks B ACK SYN ACK SYN ACK Data A time
ACK FIN + ACK FIN Timeout: Avoid reincarnation Can retransmitFIN ACK if ACK lost Connectionnow closed Normal Termination, Both Together • Same as before, but B sets FIN with their ack of A’s FIN B ACK SYN ACK SYN ACK Data A time
Sending a FIN: close() Process has finished sending data via the socket Process calls “close()” to close the socket Once TCP has sent all of the outstanding bytes… … then TCP sends a FIN Even if bytes not yet ack’d Because FIN has seqno beyond all the bytes … … and thus won’t be ack’d until all bytes are delivered Receiving a FIN: EOF Process is reading data from the socket Eventually, the attempt to read returns an EOF All bytes prior to sender calling close() have been delivered Sending/Receiving the FIN Packet
RST RST Data Abrupt Termination • A sends a RESET (RST) to B • E.g., because app. process on A crashed • That’s it • B does not ack the RST • Thus, RST is not delivered reliably • And: any data in flight is lost • But: if B sends anything more, will elicit anotherRST B ACK SYN ACK SYN ACK Data A time
Packet Packet Packet Packet Packet ACK ACK ACK ACK ACK Reasons for Retransmission Timeout Timeout Timeout Packet Timeout Timeout Timeout ACK lost DUPLICATE PACKET Packet lost Early timeout DUPLICATEPACKETS
How Long Should Sender Wait? • Sender sets a timeout to wait for an ACK • Too short: wasted retransmissions • Too long: excessive delays when packet lost • TCP sets retransmission timeout (RTO) as function of RTT • Expect ACK to arrive roughly an RTT after data sent • … plus slop to allow for variations (e.g., queuing, MAC) • But: how do we measure RTT? • And: what is a good estimate for RTT? • And: what’s a good estimate for “slop”?