350 likes | 565 Views
CS 498 Lecture 14 The Internet Protocol V4. Jennifer Hou Department of Computer Science University of Illinois at Urbana-Champaign Reading:Chapter 14, The Linux Networking Architecture: Design and Implementation of Network Protocols in the Linux Kernel. First Possible Path of an IP Packet.
E N D
CS 498 Lecture 14 The Internet Protocol V4 Jennifer Hou Department of Computer Science University of Illinois at Urbana-Champaign Reading:Chapter 14, The Linux Networking Architecture: Design and Implementation of Network Protocols in the Linux Kernel
First Possible Path of an IP Packet • Packets arrive on an interface and are stored in the input queue of the respective CPU. • Once the layer-3 protocol in the LLC has been determined (e.g., ETH_PROTO_IP), the packet is passed to the ip_rcv() function.
.. ... p8022_rcv arp_rcv ip_rcv arp_send ip_queue_xmit ETH_P_802_2 dev.c br_input.c dev.c ... handle_bridge dev_queue_xmit net_rx_action CONFIG_BRIDGE dev->qdisc->enqueue do_softirq Scheduler eth1 eth0 CPU1 CPU2 dev.c net_tx_action softnet_data[cpun].input_pkt_queue qdisc_run dev.c netif_rx Scheduler qdisc_restart eth_type_trans() driver.c dev->qdisc->dequeue dev_alloc_skb() driver.c net_interrupt dev->hard_start_xmit Recall the Packet Path in LLC
Internet Protocol Implementation in Linux Higher Layers ip_input.c ip_output.c ip_queue_xmit ip_local_deliver MULTICAST IP_LOCAL_OUTPUT . . . ip_mr_input IP_LOCAL_INPUT ip_queue_xmit2 ip_forward.c IP_FORWARD ip_local_deliver ip_forward_finish ip_forward ip_output ip_fragment ip_finish_output ip_rcv_finish ROUTING ForwardingInformation Base IP_POST_ROUTING IP_PRE_ROUTING ip_route_input ip_rcv ip_finish_output2 ARP ARP neigh_resolve_output dev.c dev.c dev_queue_xmit net_rx_action
2nd/3rd Possible Path of an IP Packet • TCP/UDP packets are packed into an IP packet and passed down to IP via ip_queue_xmit(). • The IP layer generates IP packets itself, e.g., multicast packets or fragmentaion of a large packet, or ICMP/IGMP packets.
ip_rcv(skb,dev,pkt_type) • Packets that are not addressed to the host (packets received in the promiscuous mode) are rejected. • A sanity check is performed • Does the packet have at least the size of an IP header? • Is this IP Version 4? • Is the checksum correct? • Does the packet have a wrong length? • If the actual packet size < skblen, then invoke skb_trim(skb,iphtotal_len) • The netfilter hook NF_IP_PRE_ROUTING is invoked
Packet Filtering Architecture in Linux Device driver(input) Device driver (Output) CRC check Consistency check NF_IP_PRE_ROUTING NF_IP_POST_ROUTING Forwarded packets Routing NF_IP_FORWARD (iptables: FORWARD) Routing NF_IP_LOCAL_IN (iptables: INPUT) NF_IP_LOCAL_OUT (iptables: OUTPUT) Higher layers Local processes Incoming packets Outgoing packets
ip_rcv_finish(skb) • ip_route_input() is invoked to determine the route of a packet. • skbdst is set to an entry in the routing cache which stores both the destination IP and the pointer to an entry in the hard header cache (cache for the layer 2 frame packet header) • If the IP packet header includes options, an ip_option structure is created. • skbdstinput() points to the function that should be used to handle the packet (delivered locally or forwarded further) • ip_local_deliver(), ip_forward(), ip_mr_input()
IP Forwarding • To activate IP packet forwarding, do echo ‘1’ > /proc/sys/net/ipv4/ip_forward
ip_forward(skb) • Step 1: Packet not marked with pkt_type == PACKET_HOST are deleted. • Step 2: If TTL == 1, then the packet is deleted, and an ICMP packet with ICMP_TIME_EXCEEDED set is returned. • Step 3: skb_cow(skb,headroom) is used to check whether there is still sufficient space for the MAC header in the output device. If not, skb_realloc_headroom() creates sufficient space.
Recall pkt_type in the sk_buff Structure • pkt_type: specifies the type of a packet • PACKET_HOST: a packet sent to the local host • PACKET_BROADCAST: a broadcast packet • PACKET_MULTICAST: a multicast packet • PACKET_OTHERHOST:a packet not destined for the local host, but received in the promiscuous mode. • PACKET_OTGOING: a packet leaving the host • PACKET_LOOKBACK: a packet sent by the local host to itself.
Recall the skb structure… sk_buff_head sk_buff next sk_buff prev list stamp net_device dev h nh mac Packet data area dst len ... IP-Header head UDP-Header data UDP-Data tail end datarefp: 1
Recall How skb is Managed… • skb_cow(skb,headroom) checks whether the passed socket buffer still has at least headroom bytes free in the front packet data space. • skb_realloc_headroom(skb,newheadroom) creates a new socket buffer with a headroom of size newheadroom.
ip_forward(skb) • Step 4: The TTL field of the IP packet is decremented by 1. • Step 5: If the packet length (including the MAC header) is too large (skblen > mtu) and no fragmentation is allowed (Don’t fragment bit is set in the IP header), the packet is discarded and the ICMP message with ICMP_FRAG_NEEDED is sent back. • Step 6: The netfilter hook NF_IP_FORWARDING is invoked
ip_forward_finish(skb) • If the IP options exist, they are processed in ip_forward_options(). • ip_send() is invoked to check if the pcket has to be fragmented. • Either ip_finish_output() or ip_fragment() is invoked.
ip_finish_output(skb) • The skbdev is pointed to the output network device dev. • The layer-2 packet type is set to ETH_P_IP. • The netfilter hook NF_IP_POST_ROUTING is invoked.
ip_finish_output2(skb) • If skbdst already includes a pointer to the layer 2 header cache (dsthh), then the layer-2 header is copied directly into the packet data space of the skb. • Otherwise, the neigh_resolved_output() function (that implements the ARP) is invoked. • dev_queue_xmit() is invoked to pass the packet down to the device.
ip_local_deliver(skb) • The only task of ip_lcal_deliver(skb) is to re-assemble fragmented packets by invoking ip_defrag(). • The netfilter hook NF_IP_LOCAL_IN is invoked.
ip_local_deliver_finish(skb) • The protocol ID of the IP header is used to calcualte the hash value in the ipprot hash table. • If the corresponding transport protocol can be found, then the handler is invoked. • tcp_v4_rcv(): TCP • udp_rcv(): UDP • icmp_rcv(): IMCP • igmp_rcv(): IGMP • If no protocol is found, the packet is passed to a RAW socket (if one exists) or dropped with an ICMP Destination Unreachable message returned.
Hash Table ipprot inet_protocol udp_rcv() 0 handler inet_protos[MAX_INET_PROTOS] udp_err() err_handler next inet_protocol protocol: IPPROTO_UDP copy data name: "UDP" inet_protocol igmp_rcv() 1 handler Null err_handler next protocol:IPPROTO_IGMP copy data name: "IGMP" MAX_INET_PROTOS inet_protocol
ip_queue_xmit(skb) • skbskdst is checked to see if it contains a pointer to an entry in the routng cache. • All the packets of a socket are routed through the same path, so storing a pointer to an routing entry in skdst saves expensive routing table lookup. • If not route is present (e.g., the first packet of a socket), then ip_route_output() is invoked to determine a route. • The fields of the IP packet are filled (version, header length, TOS, fragement offset, TTL, addresses and protocol). • If IP options exist, ip_options_build() is invoked. • NF_IP_LOCAL_OUTPUT is invoked.
ip_queue_xmit2(dev) • Checks how much headroom is available in he socket buffer. • The packet is checked for fragmentation and the checksum is computed (ip_send_check(iph)) • skdstoutput() causes the ip_output() function to be invoked. ip_output() invokes the netfilter NF_IP_POST_ROUTING.
1500 Byte IP TCP Daten 1000 Byte IP TCP Daten IP Daten 500 Byte IP TCP Daten IP Daten IP Daten 600 Byte 500 Byte 400 Byte IP Fragmentation • If the packet size is larger than the MTU of the transmission medium, then the packet has to be split into smaller packets.
ip_fragment(skb,output) • The maximum packet size is computed. • IP fragments are created in a while loop until the datagram has been divided into smaller packets. • For each new IP fragment • alloc_skb() is used to create a new socket. • The IP packet header is copied to the fragment, with the MF bit and the offset field properly set. • The corresponding payload is copied to the fragment as well. • If the IP options exist, ip_options_fragment() is invoked. • ip_send_check() is invoked to compute the IP checksum. • The original packet is released with kfree_skb().
Recall the IP Header Looks Like IP-packet format 0 15 31 3 7 Version IHL Codepoint Total length Fragment-ID D F Fragment-Offset M F Time to Live Protokoll Checksum Source address Destination address Options and payload
Reassembling Packets • Recall that ip_local_deliver() passes all the fragmented IP packets to ip_defrag(). • The fragments are stored in the fragment cache, until either all the fragments of a datagram have arrived, or the maximum wait time (ipfrag_time, ~30 seconds) has expired.
Fragment Cache ipq ipq 0 ipq_hash[IPQ_HASHSZ] next next saddr ... daddr fragments The hash table value is Calculated based on saddr, daddr, id, and protocol. id ... protocol pprev last_in ... fragments len sk_buff sk_buff meat A flag that specifies whether all fragments Have arrived lock refcnt timer pprev iif The length of the original packet 1 ipq sk_buff sk_buff #bytes already in the cache . . . ipq ipq IPQ_HASHZ sk_buff sk_buff sk_buff
APIs Used for Reassembling Fragments • ipq_unlink(qp) removes the ipq entry from the fragment cache • ipq_frag_destroy(qp) releases an ipq fragment list. First, frag_kfree_skb() releases all the socket buffers of fragments, and then frag_free_queue() releases the ipq structure. • ip_expire() is the handling routine for the timer. • When the timer expires, if all the fragments of the datagram have not arrived, the entry in the fragment cache is deleted, and an ICMP message of the type ICMP_TIME_EXCEEDED is sent back.
APIs Used for Reassembling Fragments • ip_frag_create(hash,iph) creates a new entry in the fragment cache and uses the IP packet header, iph, of the fragment that just arrived to initialize the entry. • ip_find(iph) searches the fragment cache for the ipq entry for an IP datagram with the iph packet header. • The hash value is calculated from the sender/destination address, protocol, and fragment ID. • If no matching entry is found then a new ipq entry is created in the fragment cache (ip_frag_create()).
APIs Used for Reassembling Fragments • ip_frag_queue(qp,skb) inserts a new fragment, skb, into the fragment list of a datagram (represented by the ipq structure pointed to by qp). • ip_frag_reasm() reassembles all the fragments of a datagram when qplen == qpmeat.