200 likes | 218 Views
This article discusses the problem of anonymizing packet traces before they are released, and explores a new tool called tcpmkpub that provides a general framework for anonymization.
E N D
The Devil and Packet Trace Anonymization Authors: Ruoming Pang, Mark Allman, Vern Paxson and Jason Lee Published: ACM SIGCOMM Computer Communication Review, Volume 36 , Issue 1 ,January 2006 Presenter: Ping Wang
Overview • Problem • How to anonymize the packet traces before released • Goal • Try to preserve as much as possible information
Background • Why share? • Verify the previous results • Compare to the competing ideas on the same data • Provide a broader view • Who share? • NLANR’s PMA packet traces • CAIDA’s skitter measurement • LBNL’s internal traffic
Background cont. • Available anonymization tools • tcpdpriv • Ipsumdump • tcpurify Not general enough, and most of them focus on only the header field, primarily IP addresses
A New tool - tcpmkpub • Provides a general framework for anonymizing traces • It is based on explicit rules for each header field
An example specification • All fileds must be specified with a name, length, action(“KEEP”, “ZERO”, function)
An example specification cont. • Supports case statement for the header fields which can vary
Anonymization Policy • Checksums • Link layer • Network layer • Transport layer
Checksums • Replace the original checksum C0 withCc • For those cannot be verified checksum • The packet has been corrupted • Insert “1” • The original packet is truncated • Use Cc (note in meta-data) • For those checksum is optional, like UCP, use zero as the checksum
Link layer • Ethernet address is 6 bytes • High 3 bytes represent the NIC vendor • Scrambling the entire 6 byte address is not good for research • Scrambling only the lower 3 bytes is not good for the vendor • Remapping these two parts seperately
Network layer (1) – focus on IP address • External addresses • Use the prefix-preserving address anonymization scheme proposed in other paper • Internal addresses • not use prefix-preserving address anonymization scheme • Use a prefix which is not used by external addresses within anonymous packet • subnet and host portions are mapped seperately.
Network layer (1) • Scanners • Many organizations run a scanner as part of security operation • Trend to hit addresses in some order, like a.b.c.1, a.b.c.2, a.b.c.3, etc. • Keep the scanner’s IP address uniform across the trace, and flag it in the meta-data. And for the destinations of the sans, use different mapping. For exmaple: X1, X2 belongs to one subnet Y • Not involve scanner, map to X’1, X’2 in subnet Y’ • Involve scanner, map to X’’1, X’’2 in subnet Z1 and Z2
Network layer (3) • Multicast addresses • preserved • Private addresses • preserved • Invalid addresses • Remap it as the subnet existed, but note this information in the meta-data.
Transport layer • Preserve both port numbers and sequence numbers • Rewrite timestamp options • Transform the timestamp into separate increasing counters • Reason: Clock drift manifest in timestamp options can be leveraged to fingerprint a physical machine
Testing • Can the transformed traces really be used? • Use p0f to do OS fingerprinting • Use tcpsum to find the number of packets and bytes in both the original and transformed traces
Test cont. • Are the transformed traces really anonymous? • Check tcpmkpub’s own log file • Look for some string in the anonymized traces • e.g. “Document”, “Setting”, “ConfirmFIleOp” • Look for like IP addresses • Look for string versions of IP addresses • MAC addresses • Check timestamps
Paper contributions • Develop a tool, tcpmkpub, for implementing arbitrary anonymization policy; • Use meta-data to help researchers to deal with lost information • Invalid checksum, scanner IP • Beyond IP address obfuscation, explore many other dangerous details • timestamp, Ethernet addresses, etc.
Paper weaknesses • Only give two experiments to show the anonymized traces are useful • Could have given some anonymization results to make the policy more clear. • For example, in the scanner case, addresses a.b.c.1, a.b.c.2, a.b.c.3, what they would look like if they are involved in scaning traffic, and what if not
Future work • Keep more consistency between the original and anonymized traces • Study online anonymization • Provide a tool which can be easily used for validation the anonymized traces • Provide a tool for creating an anonymization policy for tcpmkpub