360 likes | 477 Views
FiG: Automatic Fingerprint Generation. Shobha Venkataraman Joint work with Juan Caballero, Pongsin Poosankam, Min Gyung Kang, Dawn Song & Avrim Blum Carnegie Mellon University. Fingerprinting. Used to identify: versions of software on hosts operating systems of hosts
E N D
FiG: Automatic Fingerprint Generation Shobha Venkataraman Joint work with Juan Caballero, Pongsin Poosankam, Min Gyung Kang, Dawn Song & Avrim Blum Carnegie Mellon University
Fingerprinting Used to identify: • versions of software on hosts • operating systems of hosts • hosts running versions with vulnerabilities Linux Solaris Windows XP SP2 Network administrator Windows XP SP1
Queries Host Responses Output: what OS? (e.g. Linux) The Fingerprinting Process Fingerprint: • set of queries sent to host + • classification function analyzing queries & responses Well-known fingerprinting tools: nmap, fpdns Fingerprinting Tool
What classification function? Fingerprinting Tool What queries? Finding Fingerprints • How do fingerprinting tools get fingerprints? • Existing approach: • Manual identification • Incomplete, time-consuming • Difficult to keep up-to-date Need automatic, accurate fingerprint generation!
Our Contribution: FiG In particular: • Use machine learning to automatically generate fingerprints • Automatically generate accurate fingerprints: • Distinguishing OS • Distinguishing implementations of DNS servers • Finding new fingerprints Demonstrate automatic fingerprint generation is possible
Outline • Fingerprint Generation Problem • Overview of Approach • Automatic Fingerprint Generation • Experimental Results • Conclusion
Fingerprints Fingerprint Generation Problem Solaris Goal: find fingerprints, i.e. • Useful queries • Classification function that distinguishes implementations Windows XP Linux Fingerprint Generator Fingerprinting Tool
Outline • Fingerprint Generation Problem • Overview of Approach • Automatic Fingerprint Generation • Experimental Results • Conclusion
Candidate Queries Query Exploration Fingerprints FiG: Overview of Approach • Query exploration: Generate candidate queries • Learning: Automatically find fingerprints Learning Fingerprinting Tool FiG: Automatic Fingerprint Generation
Candidate Queries Query Exploration Fingerprints FiG: Overview of Approach Learning Fingerprinting Tool FiG: Automatic Fingerprint Generation
Query Exploration • Goal: generate candidate queries • query: specially crafted packet sent to host • Infeasible to generate all possible queries • All queries = all possible byte combinations of packet header • e.g., 40 bytes of TCP & IP header => 2^320 queries! • Instead, use protocol semantics to design queries
Query Exploration • Queries: packets with unusual values in fields of header Explore unusual values for fields independently • Explore fields with rich semantics exhaustively i.e., all possible values e.g., TCP flags • Explore other fields selectively i.e., some valid, invalid values e.g., tcp checksum, tcp src port
Candidate Queries Query Exploration Fingerprints Testing Phase: test accuracy of fingerprints Training Phase: learn potential fingerprints Data Collection FiG: Overview of Approach Learning Fingerprinting Tool
Training Phase Testing Phase Data Collection Data Collection 1. Send candidate queries to hosts 2. Collect responses from hosts 3. Split into training & testing data Training Data Candidate Queries And Responses Data Collection Testing Data
Testing Phase Data Collection Training Phase Training Phase Goal: learn potential fingerprints from data Intuition: different implementations differ in bytes of responses Learn which bytes of responses distinguish between implementations!
Testing Phase Data Collection What we’re learning Training Phase Outline: • Features • Classification functions • Combining into fingerprints <queries, responses>Solaris 1. Extract features <queries, responses> Linux Data Collection <queries, responses> Windows 2. Combine features to distinguish implementations Training Data
a b c d e f g h j k i k a b c d e f g h j i 3 0 7 4 9 6 Features • Analyze only bytes of response • Use both value & position of individual bytes in response • Capture this idea with position-substring Response byte sequence 2 10 1 8 5 Some example position-substrings
position-substrings of response to query q Classification Functions Analyze each query & each implementation separately e.g. for query q, for Linux implementation YES (comes from Linux) Classification function NO (does not come from Linux) Two classes of functions: • Conjunctions • Decision lists
00 00 16 d0 00 04 16 d0 Conjunctions • Capture identical behaviour across all hosts • require position-substrings distinctive to Linux to appear in responses from ALL Linux hosts if (response[4-5]==0x0000 && response[34-35]==0x16d0) then Linux else NotLinux Linux NotLinux Positions 34-35 Positions 4-5
ff ff 40 e8 Decision Lists • Need more expressivity than conjunctions • Capture multiple types of behaviour within implementation • allow many sets of position-substrings, each distinctive to implementation (e.g. Windows) if (response[34-35] == 0xffff) then Windows else if (response[34-35] == 0x40e8) then Windows else NotWindows Windows Windows Positions 34-35
Testing Phase Data Collection What we’re learning Training Phase Outline: • Features • Classification functions • Combining into fingerprints <queries, responses>Solaris 1. Extract features <queries, responses> Linux Data Collection <queries, responses> Windows 2. Combine features to distinguish implementations
Binary-fingerprints • Binary-fingerprint for implementation (e.g., Linux) is: • single query + • classification function: e.g., conjunction or decision list • = boolean: e.g. Linux, or Not Linux? • Binary-fingerprint separates ONE implementation • Learning (so far) finds binary-fingerprints • Conjunctions/decision lists of position-substrings (e.g. Linux or Not Linux? Windows or NotWindows?)
Multi-class Fingerprint • Combine binary-fingerprints for multiple implementations • Multi-class fingerprint is: • single query + • classification functions e.g. conjunctions, decision lists • = implementation, e.g. Linux, Windows, Solaris, unknown? Linux or Not Linux? Linux? Windows? Solaris? unknown? Windows or Not Windows? Solaris or Not Solaris? Multi-class fingerprint (for query q) Binary-fingerprints for query q
Training Phase Summary • Analyze responses to all queries, one at a time • Use position-substrings of bytes in response • Generate binary-fingerprints & multi-class fingerprints • Send these to testing phase
Training Phase Data Collection Testing Phase Testing Phase Binary & Multi-class Fingerprints Which fingerprints are accurate? Fingerprints Testing Data Fingerprinting Tool
Outline • Fingerprint Generation Problem • Overview of Approach • Automatic Fingerprint Generation • Query Exploration Phase • Learning Phase • Experimental Results • Experimental Setup & Data • Fingerprinting Results: Binary & Multi-class Fingerprints • Examples of New Fingerprints • Conclusion
Experiment Setup & Data • OS fingerprint generation: • 3 OS: 77 Windows, 29 Linux, 22 Solaris hosts • 305 different queries • DNS fingerprint generation: • 5 DNS server implementations: 10 BIND8, 12 BIND9, 11 Windows Server 2003, 10 MyDNS, 11 TinyDNS hosts • 96 different queries
Multi-class Fingerprints One-query fingerprint distinguishing ALL implementations simultaneously • OS: 66 queries with multi-class fingerprints • DNS: 19 queries with multi-class fingerprints • All these are decision lists! • No multi-class fingerprints with conjunctions found • Decision list has greater discriminatory power
All Fingerprints: OS One-query fingerprint distinguishing ONE implementation from rest Binary-fingerprints • Lots more binary-fingerprints! • Find conjunctions & decision lists in binary-fingerprints • Again, more fingerprints with more expressive decision lists • Similar results for DNS
Examples of New Fingerprints • Invalid value in data offset field: • Windows & Solaris hosts respond when value < 5 • Linux hosts do not respond • RST+ACK packets in responses: • Linux & Solaris hosts set TCP Ack # to 0 • Windows hosts set TCP Ack # to Ack # of query
Examples of New Fingerprints • Behaviour on ECN & CWR bits • Linux & Windows ignore ECN & CWR bits in queries • Solaris do not ignore them (sometimes) • Behaviour of QdCount field on invalid queries (DNS fingerprinting) • Some servers copy the field value, others don’t
Conclusion • Automatic fingerprint generation is possible • Use machine learning to identify fingerprints • Generate fingerprints automatically for 2 applications: • Distinguish OS • Distinguish implementations of DNS servers • Find multi-class fingerprints using decision lists • Discover new fingerprints for fingerprinting tools
Thank You! Questions? shobha@cs.cmu.edu
Binary-fingerprints: DNS One-query fingerprint distinguishing ONE implementation from rest • Similar results for DNS binary-fingerprints • More fingerprints with more expressive decision list • No binary-fingerprints with conjunctions for BIND8 & BIND9
Related Work • Active fingerprinting: • Comer & Lin ’94: Probing to find differences in TCP • Padhye & Floyd ’01: compliance testing & protocol violations • Passive Fingerprinting • Paxson ’97: TCP implementation with traffic traces • Beverly ’04, Lippman et al ’03: classify OS • Franklin et al ’06: wireless device driver fingerprinting • Tools: • OS fingerprinting: Nmap, queso, Xprobe, Snacktime • Passive fingerprinting: p0f, siphon • Defeating OS fingerprinting: • Smart et al ’00: TCP Fingerprint scrubber • Tools: Morph, IPPersonality