260 likes | 480 Views
Linux TCP/IP Stack. Process. Socket layer. 2: Data Link. Interface Layer (Ethernet, etc.). Protocol Layer (TCP / IP). TCP / IP vs. OSI model. 7: Application 6: Presentation 5: Session. 4: Transport 3: Network. 1: Physical
E N D
Process Socket layer 2: Data Link Interface Layer (Ethernet, etc.) Protocol Layer (TCP / IP) TCP / IP vs. OSI model 7: Application 6: Presentation 5: Session 4: Transport 3: Network 1: Physical Layer
Output Queue Input Queue TCP/IP Stack Overview Process 1: sosend (……………... ) 5: recvfrom(……….) Socket Layer 2: tcp_output ( ……. ) 4: tcp_input ( ……... ) Protocol Layer (TCP Layer) 3: ip_input ( ……... ) 3: ip_output ( ……. ) Protocol Layer (IP Layer) 4: ethernet_output ( ……. ) 2: ethernet_input ( …….. ) Interface Layer (Ethernet Device Driver) Physical Media
Process Layer to TCP Layer send (int socket, const char *buf, int length, int flags) Process Kernel sendto (int socket, const char *data_buffer, int length, int flags, struct sockaddr *destination, int destination _length) sendit (struct proc *p, int socket, struct msghdr *mp, int flags, int *return_size) uipc_syscalls.c sosend (struct socket *s, struct mbuf *addr, struct uio *uio, struct mbuf *top, struct mbuf *control, int flags ) uipc_socket.c tcp_userreq (struct socket *s, int request, struct mbuf *m, struct mbuf * nam, struct mbuf * control ) tcp_userreq.c TCP Layer tcp_output (struct tcpcb *tp) tcp_output.c
Socket Layer sendto (int socket, const char *data_buffer, int length, int flags, struct sockaddr *destination, int destination _length) MBUF Chain m_next = NULL m_next m_nextpkt = NULL m_nextpkt = NULL m_len = 100 m_len = 50 28 Bytes m_data 20 Bytes m_data m_type = MT_DATA m_type = MT_DATA data_buffer m_flags = M_PKTHDR m_flags = 0 Data m_pkthdr.len = 150 128 Bytes mBuf m_pkthdr.recvif =NULL 50 Bytes Data Unused Space 150 Bytes Data 100 Bytes 58 Bytes
Copy data_buffer mbuf error Socket Layer -sosend passes data and control information to the protocol layer sosend(struct socket *s, struct mbuf *addr, struct uio *uio, struct mbuf *data_buffer, struct mbuf *control, int flags ) Initialize a new memory buffer and variables to hold flags Is there enough space in the buffer sbspace(s->sb_snd) no yes int error = tcp_usrreq(s, flags, mbuf, addr, control) More buffers to send? yes Free the memory buffers received 0 1 no Return value of error to sendto ( )
TCP Layer - tcp_usrreq(struct socket *s, int request, struct mbuf *data_buffer, mbuf *nam, mbuf * control) Initialize internet protocol control block inp and TCP control block tp to store information useful for TCP Convert Socket to Internet Protocol Control Block inp = sotoinpcb(so) Convert the internet protocol control block to a tcp control block tp = intopcb(inp) request PRU_SEND return error to tcp_userreq( ) int error = tcp_output(tp)
TCP Layer (tcp_output.c) - tcp_output(struct tcpcb *tp) Called by tcp_usrreq for one of the following reasons: To send the initial SYN To send a finished_sending message To send data To send a window update after data has been received. tcp_ouput ( ) functionality: 1. determines whether TCP can send a segment or not depending on: flags in the data sent by the socket layer to send an ACK, etc. Size of window advertised by the receiver’s end. Amount of data ready to send whether unacknowledged data already exists for the connection 2. Calculate the amount of data to be sent depending on: size of receiver’s window number of bytes in the send buffer 3. Check for window shrink 4. Send a segment Allocate a buffer for the TCP and IP header from the header template Copy the TCP and IP header template into the the buffer to be sent. Fill the fields in the TCP header. Decrement the number of buffers to tbe sent, so that the end can be checked. Set sequencenumber and acknowledgement field. Set three fields in the IP header - IP length, TTL and Tos. Pass the datagram to IP
TCP Layer (tcp_output.c) - tcp_output(struct tcpcb *tp) struct socket *so = tp -> t_inpcb -> inp_socket Initialize a tcp header tcp_header Idle is true if the max sequence number equals the oldest unacknowledged sequence number, if an ACK is not expected from the other end. int idle = (tp -> snd_max == tp -> snd_una) false idle Check ACK Flag Acknowledgement is not expected, set the congestion window to one segment tp -> snd_cwnd = tp -> t_maxseg; true
TCP Layer - tcp_output(struct tcpcb *tp) Acknowledgement is not expected, set the congestion window to one segment tp -> snd_cwnd = tp -> t_maxseg; off is the offset in bytes from the beginning of the send buffer of the first data byte to send. off bytes have already been sent and acknowledgement on those is awaited. int off = tp -> snd_nxt - tp -> snd_una Determine length of data that should be transmitted and the flags to be used. len is the minimum number of bytes in the send buffer, win (the minimum of the receiver’s window) and the congestion window. len = min(so -> so_snd.sb_cc, win) - off Determine the flags like TH_ACK, TH_FIN, TH_RST, TH_SYN flags = tcp _outflags [ tp -> t_state ]
TCP Layer - tcp_output(struct tcpcb *tp) Determine the flags like TH_ACK, TH_FIN, TH_RST, TH_SYN flags = tcp _outflags [ tp -> t_state ] tp -> t_flags & TF_ACKNOW true Send acknowledgement false tp -> t_flags & TF_SYN || TH_RST true Send sequence number or reset false tp -> t_flags & TH_FIN true Finished sending false
Ckeck flags to determine the type of message: window probe retransmission normal data transmission Allocate an mbuf for the TCP & IP header and data if possible. MGETHDR ( m, M_DONTWAIT, MT_HEADR) M_DONTWAIT indicates that if memory is not available for mbuf then come out of the routine and return an error state. Length of data < 44 Bytes 100 - 40 - 16 no Create a new mbuf chain, copy the surplus data and point it to the first mbuf chain. yes Copy the data from the socket send buffer into the new packet header mbuf ip_output(m, tp->t_inpcb -> inp_options, &tp -> t_inpcb -> inp_route, so -> so_options & SO_DONOTROUTE, 0)
ip_output.c ip_output(struct mbuf *m, struct mbuf *opt, struct route *ro, int flags, struct ip_moptions *imo) 1. Header initialization 2. Route Selection 3. Source address selection and Fragmentation 1.Header initialization Packets damaged? Check if there were any errors while adding headers in higher layers. Most of the fields of the IP header are pre defined by higher layer protocols. ERROR yes no • The value of “flags” decides what’s to be done with the data • IP_FORWARDING : Forward packet • IP_ROUTETOIF : Route directly to Interface • IP_ALLOWBROADCAST : Allow broadcasting of packet • IP_RAWOUTPUT : Packet contains pre-constructed header if ((flags == IP_FORWARDING ) || (flags == IP_RAWOUTPUT )) yes If the packet has to be forwarded to another host, i.e if the machine is acting as a router, then the IP header for forwarded packets should not be modified by ip_output. no Save header length in hlen for fragmentation algorithm Construct and initialize IP header set ip_v = 4, clear ip_off assign unique identifier to ip_id length, offset, TTL, protocol, TOS etc are set by higher layers. If the packet is not being forwarded and has to be sent to another host then initialize the IP header.
2. Route Selection A cached route may be provided to ip_output as an argument. UDP and TCP maintain a route cache associated with each socket. Verify Cached Route for destination address Check if the cached route is the correct destination. If a route has not been provided, ip_output sets a temporary route structure called iproute. If (cached_route == destination) yes Find the interface on which the packet has to be placed. Ifp points to the interface’s ifnet structure. If the cached route is provided, find the interface on which the frame has to be sent. no If the packet is being routed, rtalloc locates a route to the address specified by dst. If rtalloc fails, an EHOSTUNREACH error is generated. If ip_forward called ip_output the error is converted to an ICMP error. If the address is found then ifp is made to point to thr ifnet structure for the interface. If the next hop is not the packets final destination, then dst is changed to point to the next hop router. Locate route : Call rtalloc(dst_ip) to locate a route to the destination. Find the interface on which the packet has to be placed. Ifp points to the interface’s ifnet structure. If rtalloc(dst_ip) fails to find a route, return host unreachable error.
3. Source address selection and Fragmentation The final section of the ip_output ensures that the IP header has a valid source IP address. This couldn’t have been done earlier because the route hadn’t been selected yet. If there is no source IP then the IP address of the outgoing interface is used as the source IP. Check if valid source address is specified. Select the IP address of the outgoing interface as the source address. no yes Does the packet have to be fragmented ? yes Larger packets (packets that exceed the MTU) must be fragmented before they can be sent. Fragment the packet if it’s size is greater than the MTU. no In either case (fragmented or not) the checksum is computed (in_cksum). If no errors are found, the data is sent to if_output function of the output interface. If there are no check_sum errors, send the data to if_output function of the selected interface.
Interface Layer (if_ethersubr.c) ether_output(struct ifnet *ifp, struct mbuf *mbuf, struct sockaddr *destination, struct rtentry *routing_entry) 1. Verification 2. Protocol-Specific Processing 3. Frame Construction 4. Interface Queuing. 1. Verification Ethernet port up and running ? ifp -> if_flags & (IF_UP | IF_RUNNING ) no senderr (ENETDOWN) yes
Interface Layer(if_ethersubr.c) - ether_output(struct ifnet *ifp, struct mbuf *mbuf, struct sockaddr *destination, struct rtentry *rt_entry) Function: Takes the data portion of an Ethernet frame ans encapsulates it with a 14-byte header and places it on the interface send_queue. Phases: Verification, Protocol-Specific Processing, Frame Construction, Interface Queuing. Arguments - ifp points to outgoing interface’s ifnet structure mbuf is the data to be sent destination is the destination address rt_entry points o the routing entry Initialize- Ethernet header - struct eth_header *eh Ethernet port up and running ? ifp -> if_flags & (IF_UP | IF_RUNNING ) Verification no senderr (ENETDOWN) yes
Route valid ? rt_entry = rtalloc1 (destination, 1) 0 senderr (EHOSTUNREACH) 1 Next hop a gateway ? rt = rt -> rt_gwroute 0 1 Destination responding to ARP requests? If not then do not send more packets to avoid flooding. rt -> rt_flags & RTF_REJECT no Verification Protocol Specific Processing
Functionality: Finds Ethernet address corresponding to the IP address of the destination. Protocol Specific Processing destination -> sa_family AF_INET Send ARP broadcast to find the ethernet address corresponding to the destination IP address Use m_copy( ) to keep the packet till an ack. Is recvd. Frame Preparartion
Protocol Specific Processing Frame Preparartion Make sure there is room for the 14 byte ethernet header M_PREPEND ( m, sizeof(ethernet_header), M_DONOTWAIT) Form the Ethernet header from ethernet frame type, ethernet MAC address, unicast ethernet address associated with the output interface. e.g. the default gateway for a host
if_snd Frame Preparartion Interface Queuing Is the output queue full Discard the frame Free the memory buff senderr ( ENOBUFS ) yes no Place the frame on the interface’s send queue lestart ( ifp ) lestart ( ifp )
Interface Layer(if_le.c) - lestart(struct ifnet *ifp) Function: Dequeues frames from the interface output queue and arranges for them to be transmitted by the Ethernet Card. struct le_softc *le = & le_softcl [ ifp -> if_unit ] le -> sc_if.if_flags & IFF_RUNNING 0 return error 1 Copy the the frame in mbuf to the hardware buffer Set the IFF_OACTIVE on to indicate that the device is busy transmitting.