Evaluation of Header Field Entropy for Hash-Based Packet Selection

Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke, Carsten Schmoll, Tanja Zseby Fraunhofer Institute FOKUS, Berlin, Germany

Outline • Introduction Multipoint Sampling • Problem Statement • Approach • Measurement Setup • Measurement Results • Conclusion

Introduction Multipoint Sampling Passive Multipoint Measurements • at observation points a packet ID and timestamp exported for each packet • trace observable based on occurrence of packet ID • delay = timestamp A – timestamp B of packets with equal ID Multipoint Collector Point B Point A Point C

Introduction Multipoint Sampling CChallenge in Passive Multipoint Measurements • immense amounts of measurement data • High infrastructure costs: processing, storing, exporting Random Packet Selection and Estimation Random Sampling (n-out-of-N, probabilistic) unsuitable -> inconsistent sample at observation points Duffield and Grossglauser in “Trajectory Sampling for Direct Traffic Observation” propose hash-based packet selection.

Introduction Multipoint Sampling Hash-Based Paket Selection IP Header Transport Header Payload hash input hash function packet selected packet not selected consistent selected subset if x, h and S are equal at all observation points

Problem Statement Which packet content to use as hash input? Requirements for header fields • static between network nodes ( IP TTL and checksum) • variable among packets Challenge: • HBS is deterministic; but goal is to emulate random selection • choice of hash input can introduce bias to the selection

Problem Statement How bias is introduced • packets in a hash input collision have same hash input • selection decision is not independent • the more packets in collision the more grievous the bias • unsuitable to use whole packet because hash value calculation time increases with hash input length

Approach Approach • packets differ more often in high variable bytes • entropy per byte used to measure variability Entropy InformationEfficiency pi probability that hash value i occurs H(B) entropy dependent on discrete Variant of Byte Values

Measurement Setup Evaluation dependent on analyzed traces • 6 IPv4 trace groups – 1 IPv6 • geographical locations (NZ, AUT, FR, NED – 2 LEO) • network location (university, peering point, large ISP) • application mix

Measurement Results Entropy IPv4

Measurement Results High Entropy Header Fields • IPv4: Identification, Length LSB, Src/Dst Address 2 LSB • TCP: Chksum, SeqNo, AckNo, Src/Dst Port 2 LSB • UDP: Chksum, Length LSB, Src/Dst Port 2 LSB • ICMP: Chksum, Bytes 12,13,18,19 • IPv6: Length LSB • more IPv6 traces required for further evaluation • Addresses anonymized and no transport header - only 8 bytes could be evaluated Recommended 8 byte Configuration IP ID field + 6 Transport Header Bytes: • TCP (Checksum, 2 LSB of Seq and AckNo) • UDP (Checksum, Source Port, LSB Destination Port, LSB Length) • ICMP (Checksum, Bytes 12,13,18,19)

Measurement Results Empirical Hash Input Collisions Evaluation • 4 configurations used • whole IP and transport header (minimum reachable collisions) • only IP header (bad configuration) • 8 high entropy bytes • Molina‘s 16 bytes • sum of packets on 20 largest collisions of each trace • Large collision: all or none decision of all packets that have same attributes • Small collisions: packets equal in one collision but different between

Measurement Results Hash Input Collision Comparison • recommended 8 bytes better than Molina’s 16 bytes • LEO2 traces include a large VPN traffic flow with UDP Checksum==0 – more high entropy bytes should be used

Conclusion Outcome • give a recommendation of 8 bytes for use as hash input for HBS • 8 recommended bytes sufficient to gain unique hash inputs Henke, Schmoll, Zseby “Empirical Evaluation of Hash Functions for Multipoint Measurements” • hash calculation time linear increase with input length • hash functions are able to select representative subset based on 8 bytes

Future Work Correlation between Bytes Correlation between address bytes entropy of combined bytes expected to be average of entropy IPv6 entropy evaluation of IPv6 addresses transport headers

Evaluation of Header Field Entropy for Hash-Based Packet Selection