150 likes | 285 Views
Evaluation of Header Field Entropy for Hash-Based Packet Selection. Christian Henke, Carsten Schmoll, Tanja Zseby Fraunhofer Institute FOKUS, Berlin, Germany. Outline. Introduction Multipoint Sampling Problem Statement Approach Measurement Setup Measurement Results Conclusion.
E N D
Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke, Carsten Schmoll, Tanja Zseby Fraunhofer Institute FOKUS, Berlin, Germany
Outline • Introduction Multipoint Sampling • Problem Statement • Approach • Measurement Setup • Measurement Results • Conclusion
Introduction Multipoint Sampling Passive Multipoint Measurements • at observation points a packet ID and timestamp exported for each packet • trace observable based on occurrence of packet ID • delay = timestamp A – timestamp B of packets with equal ID Multipoint Collector Point B Point A Point C
Introduction Multipoint Sampling CChallenge in Passive Multipoint Measurements • immense amounts of measurement data • High infrastructure costs: processing, storing, exporting Random Packet Selection and Estimation Random Sampling (n-out-of-N, probabilistic) unsuitable -> inconsistent sample at observation points Duffield and Grossglauser in “Trajectory Sampling for Direct Traffic Observation” propose hash-based packet selection.
Introduction Multipoint Sampling Hash-Based Paket Selection IP Header Transport Header Payload hash input hash function packet selected packet not selected consistent selected subset if x, h and S are equal at all observation points
Problem Statement Which packet content to use as hash input? Requirements for header fields • static between network nodes ( IP TTL and checksum) • variable among packets Challenge: • HBS is deterministic; but goal is to emulate random selection • choice of hash input can introduce bias to the selection
Problem Statement How bias is introduced • packets in a hash input collision have same hash input • selection decision is not independent • the more packets in collision the more grievous the bias • unsuitable to use whole packet because hash value calculation time increases with hash input length
Approach Approach • packets differ more often in high variable bytes • entropy per byte used to measure variability Entropy InformationEfficiency pi probability that hash value i occurs H(B) entropy dependent on discrete Variant of Byte Values
Measurement Setup Evaluation dependent on analyzed traces • 6 IPv4 trace groups – 1 IPv6 • geographical locations (NZ, AUT, FR, NED – 2 LEO) • network location (university, peering point, large ISP) • application mix
Measurement Results Entropy IPv4
Measurement Results High Entropy Header Fields • IPv4: Identification, Length LSB, Src/Dst Address 2 LSB • TCP: Chksum, SeqNo, AckNo, Src/Dst Port 2 LSB • UDP: Chksum, Length LSB, Src/Dst Port 2 LSB • ICMP: Chksum, Bytes 12,13,18,19 • IPv6: Length LSB • more IPv6 traces required for further evaluation • Addresses anonymized and no transport header - only 8 bytes could be evaluated Recommended 8 byte Configuration IP ID field + 6 Transport Header Bytes: • TCP (Checksum, 2 LSB of Seq and AckNo) • UDP (Checksum, Source Port, LSB Destination Port, LSB Length) • ICMP (Checksum, Bytes 12,13,18,19)
Measurement Results Empirical Hash Input Collisions Evaluation • 4 configurations used • whole IP and transport header (minimum reachable collisions) • only IP header (bad configuration) • 8 high entropy bytes • Molina‘s 16 bytes • sum of packets on 20 largest collisions of each trace • Large collision: all or none decision of all packets that have same attributes • Small collisions: packets equal in one collision but different between
Measurement Results Hash Input Collision Comparison • recommended 8 bytes better than Molina’s 16 bytes • LEO2 traces include a large VPN traffic flow with UDP Checksum==0 – more high entropy bytes should be used
Conclusion Outcome • give a recommendation of 8 bytes for use as hash input for HBS • 8 recommended bytes sufficient to gain unique hash inputs Henke, Schmoll, Zseby “Empirical Evaluation of Hash Functions for Multipoint Measurements” • hash calculation time linear increase with input length • hash functions are able to select representative subset based on 8 bytes
Future Work Correlation between Bytes Correlation between address bytes entropy of combined bytes expected to be average of entropy IPv6 entropy evaluation of IPv6 addresses transport headers