580 likes | 730 Views
What's Under Your Hood? Implementing a Network Monitoring System. jonschipp@gmail.com. jonschipp@gmail.com. Who am I?. Jon Schipp Unix Admin Linux & Unix User Group Southern Indiana Computer Klub. jonschipp@gmail.com. and. I like computers a lot. jonschipp@gmail.com.
E N D
What's Under Your Hood?Implementing a Network Monitoring System jonschipp@gmail.com
jonschipp@gmail.com Who am I? • Jon Schipp • Unix Admin • Linux & Unix User Group • Southern Indiana Computer Klub
jonschipp@gmail.com and... I like computers a lot
jonschipp@gmail.com What's Network Monitoring? Monitoring? • Monitoring your network • Collecting data i.e. network traffic • Interpreting the data
jonschipp@gmail.com Why? • Network issues • Attack detection • Record keeping • Fun
jonschipp@gmail.com Focus • Small/Medium size business • Basement endeavors • Cheap goods • Working with what you have
jonschipp@gmail.com where the magic happens
jonschipp@gmail.com gimme the data • hubs • monitor/SPAN ports, port mirroring • taps • ip forwarding/relaying/tunneling, whatev
Wireshark Remote Feature • Network Minor Pro: Pcap-over-IP tcpdump -nni eth0 -s0 -w -| nc 192.168.1.254 33246 SSL/Encryption: ssh, socat, ncat, crypcat, stunnel • Netfilter's Iptables • iptables -t mangle -A PREROUTING -p tcp -m multiport --dport 80,443,22,20,21 -i eth0 -j TEE --gateway 192.168.1.254 iptables -t mangle -A PREROUTING -p tcp -m multiport --dport 80,443,22,20,21 -o eth0 -j TEE --gateway 192.168.1.254 • OpenBSD's PF • pass out on em0 dup-to (em1 192.168.1.254) proto tcp from any to any port { 80, 443, 22, 20 ,21 } pass in on em0 dup-to (em1, 192.168.1.254) proto tcp from any to any port { 80, 443, 22, 20, 21 } jonschipp@gmail.com Forwarding/Relaying
jonschipp@gmail.com Architecture
jonschipp@gmail.com High Speed Packet Capture • High-end equipment is expensive • DIY: tuning and compiling • Hardware is pretty fast nowadays but... • We are using software that isn't • designed for efficient packet capture
jonschipp@gmail.com NIC's • Get a quality card • NAPI is good • DMA is good • Intel PRO/1000 MT Gigabit models are • generally good, $30 on Ebay
jonschipp@gmail.com PCI buses (bus speed in MHz) * (bus width in bits) / 8 = speed in Megabytes/second PCI 66 MHz * 32 bit / 8 = 264 MB/s PCI X 66 MHz * 64 bit / 8 = 400 MB/s (minus 20% overhead) PCI X 133 MHz * 64 bit / 8 = 850 MB/s (minus 20% overhead) PCI X 266 MHz * 64 bit / 8 = 1700 MB/s (minus 20% overhead) PCI X 533 MHz * 64 bit / 8 = 3400 MB/s (minus 20% overhead) PCIe v1 2500 Mhz * 32 1 bit lanes / 8 = 250 MB/s (minus 20% overhead) PCIe v2 x1 5000 Mhz * 1 1 bit lane / 8 = 500 MB/s (minus 20% overhead) PCIe v2 x2 5000 Mhz * 2 1 bit lanes / 8 = 1000 MB/s (minus 20% overhead) PCIe v2 x4 5000 Mhz * 4 1 bit lanes / 8 = 2000 MB/s (minus 20% overhead) PCIe v2 x8 5000 Mhz * 8 1 bit lanes / 8 = 4000 MB/s (minus 20% overhead) PCIe v2 x16 5000 Mhz * 16 1 bit lanes / 8 = 8000 MB/s (minus 20% overhead) PCIe v2 x32 5000 Mhz * 32 1 bit lanes / 8 = 16000 MB/s (minus 20% overhead) PCIe v3 x32 5000 Mhz * 32 1 bit lanes / 8 = 19700 MB/s (minus 1.5% overhead) 1000/8 = 128 Megabytes/second. 10000/8 = 1250 Megabytes/second
jonschipp@gmail.com Other things • Decent commodity CPU, e.g. Opteron whoops Xeon in capture • SMP is good • If you plan on storing the data, writing to disk will be a bottleneck • RAID Striping, SATA? for sure • SSD (maybe ?) nah
jonschipp@gmail.com Typical Frame Processing • Frame reaches NIC • Ethernet preamble is removed • FCS is calculated, if bad, dropped • If interface is set in promiscuous mode, capture all • Else, only process when dst MAC is me (unicast), or broadcast, or multicast (if on) • FIFO to kernel ring buffer, CPU or DMA • NIC generates an interrupt, interrupt handler is called • Passed to host stack → ip_input module → tcp/udp module → userspace
jonschipp@gmail.com Frame Processing
jonschipp@gmail.com Specimen • FreeBSD 8.2-RELEASE • Ubuntu Server 10.04
jonschipp@gmail.com mbuf kernel structure • FreeBSD - data and headers are stored in mbufs and mbuf clusters $netstat -m | head -n 3 82/653/735 mbufs in use (current/cache/total) 0/648/648/25600 mbuf clusters in use (current/cache/total/max) 0/256 mbuf+clusters out of packet secondary zone in use (current/cache) man mbuf: The total size of an mbuf, MSIZE, is a constant defined in <sys/param.h>. $grep -H -n MSIZE /sys/sys/param.h sys/sys/param.h:145:#define MSIZE 256 /* size of an mbuf */ sysctl kern.ipc.nmbclusters=25600 (default) $ vmstat -z | grep mbuf_cluster mbuf_cluster: 2048, 25600 ^size^ ^limit^
jonschipp@gmail.com sk_buff kernel structure • Linux - data and headers are stored in sk_buffs /usr/include/linux/skbuff.h
jonschipp@gmail.com Problems • Each packet generates an interrupt, this can lead to receive live lock/interrupt storm • Context switches • System Calls
jonschipp@gmail.com Solutions • Device Polling • NAPI • Shared memory, mmap(), and Zero Copy • Bypassing host stack
jonschipp@gmail.com Solutions, less so • Checksum offloading • Large Receive Offload (LRO) • Larger on-board memory size • More data descriptors
jonschipp@gmail.com Capture Mechanisms/Subsystems • Berkeley Packet Filter (BPF) Filter packets before they get to user space • Linux Socket Filter (LSF) Extended BPF (kinda) • and PF_RING (Linux) • Others: CSPF, NDIS, xPF, MPF, DPF, Swift and so on...
jonschipp@gmail.com libpcap • C library for packet capture • Runs on almost all the modern Unices winpcap for windows • When data reaches user space, it's stored in the libpcap buffer, applications read from it Provides link layer access to data available on the network through interfaces attached to the system.
jonschipp@gmail.com FreeBSD Frame Processing
jonschipp@gmail.com FreeBSD Processing cont. • 3 copies due to double buffer • Deals with smaller buffers compared to Linux • Half of the double buffer is copied to user space • Packet is passed to each BPF device, /dev/bpf[0-9](where application via libpcap binds to) • App reads from HOLD buffer, data is copied from the • STORE buffer into the HOLD buffer
jonschipp@gmail.com Linux Frame Processing
jonschipp@gmail.com Linux Processing cont. • 2 copies • Deals with larger buffers compared to FreeBSD • Smart queue, pointers • Packets copied individually, not whole buffers full of • packets • If packets are available, wake up user spacer(libpcap) to grab data from LSF
jonschipp@gmail.com Tuning: Interrupt Livelock • Interrupt usage high? • Most modern Linux kernels are compiled with device polling • FreeBSD does not have it on by default options DEVICE_POLLING options HZ=1000 make buildkernel KERNCONF=NEWKERN make installkernel KERNCONF=NEWKERN ifconfig em0 polling • Get a New API (NAPI) card
jonschipp@gmail.com Tuning: Buffers • Kernel dropping lots of packets? • Increase the size of your kernel buffers • FreeBSD sysctl net.bpf.bufsize=4096 sysctl net.bpf.maxbufsize=524288 • Linux sysctl net.core.rmem_default=114688 sysctl net.core.rmem_max=131071 net.core.netdev_max_backlog=1000 • Increase kernel virtual memory size
jonschipp@gmail.com Tuning: Drivers • Bad NIC performance? • FreeBSD: man driver e.g. man em: hw.em.rxd Number of receive descriptors allocated by the driver. The default value is 256. The 82542 and 82543-based adapters can handle up to 256 descriptors, while others can have up to 4096. echo hm.em.rxd=4096 >> /boot/loader.conf • Linux: ethtool, find driver README file (/usr/src/linux/) ethtool –g eth0 ethtool -G rx 4096
jonschipp@gmail.com tcpdump tests, average • 6,000,000 packets in 60 seconds using iperf, loss • OS defaults, hardware: Dell PowerEdge 2850, Xeon (Quad), 4GB RAM • tcpdump -nni em0 -w test96.pcap | FreeBSD: 0%, Linux: 8% • tcpdump -nni em0 -w /dev/null | FreeBSD: 0%, Linux: 0% • tcpdump -nni em0 -s0 -w test65535.pcap | FreeBSD: 1.6%, Linux: 22% • tcpdump -nni em0 -s0 /dev/null | FreeBSD: 0%, Linux: .02%
jonschipp@gmail.com libpcap buffers • libpcap library initializes libpcap buffer to 32kb, if bpf • value is less than 32kb if ((ioctl (fd, BIOCGBLEN, (caddr_t)&v) < 0) || v < 32768) v = 32768; • Linux initializes its buffer size at 512Kb • Increase BPF buffer size globally, all apps, remember? net.bpf.bufsize, net.bpf.maxbufsize • Libpcap will initialize its buffer to size in net.bpf.bufsize • Set buffer for tcpdump only, use -B 524288 (512kb)
jonschipp@gmail.com FreeBSD, interface drop counts netstat $ netstat -dI em0 Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop em0 1500 <Link#2> 00:02:b3:9a:c2:03 2083316 0 0 1043607 0 0 0 $ netstat –B Pid Netif Flags Recv Drop Match Sblen Hblen Command 90460 em0 p--s--- 103 0 103 632 0 tcpdump 43960 em0 p--s--- 3803363 0 3803363 712 0 ntop $ sysctl dev.em.0.dropped dev.em.0.dropped: 0 $ grep -R -H -n if_iqdrops /usr/src/ sys/dev/e1000/if_lem.c:3470: ifp->if_iqdrops++; usr.bin/netstat/if.c:289: idrops = ifnet.if_iqdrops
jonschipp@gmail.com Linux, interface drop counts $ifconfig -a | egrep -e "(^eth|drop)" $ ethtool -S eth0 $ awk '{ print $1, $5 }' /proc/net/dev ifconfig static int get_dev_fields(char *bp, struct interface *ife) { switch (procnetdev_vsn) { case 3: sscanf(bp, "%llu %llu %lu %lu %lu %lu %lu", &ife->stats.rx_bytes, &ife->stats.rx_packets, &ife->stats.rx_errors, &ife->stats.rx_dropped, ... Inter-| face drop lo: 0 br0: 3354 eth0: 0 eth1: 0 eth2: 0 eth3: 14 eth4: 0 eth5: 103395
jonschipp@gmail.com tcpdump/libpcap drops • “Packets captured” – Packets processed by tcpdump • “Received by filter” – Passed the filter (LSF, BPF) • “Dropped by kernel” - Not enough space in kernel buffer • FreeBSD (kernel drops): • libpcap gets its drop count from the kernel (BPF) • ps_drop from pcap_stats() is bs_drop from BIOCGSTATS • Linux (kernel drops) • libpcap gets its drop count from PF_PACKET’s PACKET_STATISTICS • ps_drop from pcap_stats() • ps_ifdrop – Ubuntu addendum/patch (Linux , Tru64 Unix only) from /proc/net/dev
jonschipp@gmail.com PF_RING for Linux • Creates new socket called PF_RING Works with existing PF_PACKET apps • Shared memory • Can bypass host stack, sniffing only • PF_RING aware drivers for faster capture: e1000, igb, ixgbe
jonschipp@gmail.com PF_RING for Linux • Compile PF_RING • Compile PF_RING aware libpcap and tcpdump • Load PF_RING kernel module modprobe pf_ring transparent_mode=2 enable_debug=0 enable_tx_capture=0 enable_ip_defrag=0 quick_mode=0 • Recompile all apps to use new shared libraries, libpcap and PF_RING ./configure CPPFLAGS=”-I/usr/local/include” LDFLAGS=”-L/usr/local/lib -lpfring -lpcap” \ && make && make install
jonschipp@gmail.com PF_RING DNA • Direct NIC Access, pure speed • Map NIC memory and registers to user land • Packet copy from the NIC to the DMA ring is done by the NIC's NPU • One application at a time can use the DMA ring • Requires DNA driver
jonschipp@gmail.com PF_RING TNAPI • Threaded NAPI
jonschipp@gmail.com vPF_RING • Virtual PF_RING • Hypervisor bypass • Zero-Copy
jonschipp@gmail.com netmap FreeBSD • mmap() shared memory • Use less system calls • Creates new device, /dev/netmap • 1 GHz CPU can generate the 14.8 Mpps that can saturate a 10GigE interface • supports ixgbe, e1000, re
jonschipp@gmail.com others to checkout • Ringmap – FreeBSD – code.google.com/p/ringmap/ • Zero-copy sockets – FreeBSD: man zero_copy Requires specific NIC's Recompile kernel with “options ZERO_COPY_SOCKETS” • MMAP() libpcap – Linux - http://public.lanl.gov/cpw/ The zero copy send and zero copy receive code can be individually turned off via the kern.ipc.zero_copy.send and kern.ipc.zero_copy.receive sysctl variables respectively.
jonschipp@gmail.com Interface Configuration Linux FreeBSD /etc/network/interfaces /etc/rc.conf auto eth0 iface eth0 inet manual up ifconfig eth0 0.0.0.0 -arp up up ip link set eth0 promisc on up ip link set eth0 multicast on up ip link set eth0 mtu 1514 down ip link set eth0 promisc off down ifconfig eth0 down auto eth1 iface eth1 inet manual up ifconfig eth1 0.0.0.0 -arp up up ip link set eth1 promisc on up ip link set eth1 multicast on up ip link set eth1 mtu 1514 down ip link set eth1 promisc off down ifconfig eth1 down ifconfig_em0=”inet 0.0.0.0 -arp promisc multicast mtu 1514 polling” ifconfig_em1=”inet 0.0.0.0 -arp promisc multicast mtu 1514 polling” Bridging two interfaces (Linux) brctl addbr br0 brctl addif br0 eth0 eth1 ifconfig br0 up
jonschipp@gmail.com Useful Applications • snort, ntop, tcpdump, iftop • trafshow, wireshark, tshark, tcpick • tcpflow, etherape, ngrep, tcptrack • suricata, bro-ids, ttt • xplico, ifstat, tcpflow • iptraf, bmon, bwm-ng, slurm • dsniff, p0f, tcptrace, tcpreplay • ipsumdump, speedometer
jonschipp@gmail.com ntop ntop -d -L -u ntop –access-log-file=/var/log/ntop/access.log -b -C –output-packet-path=/var/log/ntop-suspicious.log –local-subnets 192.168.1.0/24,192.168.2.0/24,192.168.3.0/24 -o -M -p /etc/ntop/protocol.list -i br0,eth0,eth1,eth2,eth3,eth4,eth5 -o /var/log/ntop
jonschipp@gmail.com netsniff-ng Linux, libpcap independent, zero-copy mechanism Kernel compiled with CONFIG_PACKET_MMAP
jonschipp@gmail.com Daemonlogger Packet Logger & Soft Tap This is a libpcap-based program. It has two runtime modes: 1)It sniffs packets and spools them straight to the disk and can daemonize itself for background packet logging. 2)It sniffs packets and rewrites them to a second interface, essentially acting as a soft tap. It can also do this in daemon mode.
jonschipp@gmail.com etherape