1 / 35

Spam and Botnets: Characterization and Mitigation

Spam and Botnets: Characterization and Mitigation. Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech. Talk Overview. Network-level behavior of spammers Ultimate goal: Construct spam filters based on network-level properties, rather than content

Download Presentation

Spam and Botnets: Characterization and Mitigation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spam and Botnets:Characterization and Mitigation Nick FeamsterAnirudh RamachandranDavid DagonGeorgia Tech

  2. Talk Overview • Network-level behavior of spammers • Ultimate goal: Construct spam filters based on network-level properties, rather than content • Content-based properties are malleable • Low cost to evasion:Spammers can easily alter content • High admin cost: Filters must be continually updated • Content-based filters are applied at the destination • Too little, too late:Wasted network bandwidth, storage, etc. • Study of DNS-based blacklists • “Discovery”:One of the mosttelling network-level properties is botnet membership • DNSBL Counter-Intelligence • Network monitoring

  3. Network-Level Behavior of Spammers: Major Findings • Where does spam come from? • Most received from few regions of IP address space • Insight about spammier prefixes could improve filters • Do spammers hijack routes? • A small set of spammers continually advertise short-lived routes • Traceability is not guaranteed • How is spam sent? • Most coming from Windows hosts (likely, bots) • Identification of spamming groups (e.g., botnets) could help

  4. Data Collection • Two domains instrumented with MailAvenger (both on same network) • Sinkhole domain #1 • Continuous spam collection since Aug 2004 • No real email addresses---sink everything • 10 million+ pieces of spam • Legitimate mail corpus from a large email provider (40 million inboxes) • Monitoring BGP route advertisements from same network

  5. Mail Collection: MailAvenger • Highly configurable SMTP server that collects many useful statistics

  6. BGP Data Collection MX 1 MX 2

  7. Spam Study: Major Findings • Where does spam come from? • Most received from few regions of IP address space • Insight about spammier prefixes could improve filters • Do spammers hijack routes? • A small set of spammers continually advertise short-lived routes • Traceability is not guaranteed • How is spam sent? • Most coming from Windows hosts (likely, bots) • identification of spamming groups (e.g., botnets) could help

  8. What IP ranges does spam come from? Spam comes from a few concentrated regions of IP address space Fraction /24 prefix

  9. Distribution across ASes Top 10 ASes by Spam Count Top 10 ASes by Legit Email Count • Top two spamming ASes: 10% of received spam • ASes in the US: most spam • Top ASes for legitimate email are different Points to note

  10. Spam Study: Major Findings • Where does spam come from? • Most received from few regions of IP address space • Insight about spammier prefixes could improve filters • Do spammers hijack routes? • A small set of spammers continually advertise short-lived routes • Traceability is not guaranteed • How is spam sent? • Most coming from Windows hosts (likely, bots) • Indentification of spamming groups (e.g., botnets) could help

  11. BGP Spectrum Agility • Log IP addresses of SMTP relays • Join with BGP route advertisements seen at network where spam trap is co-located. A small club of persistent players appears to be using this technique. Common short-lived prefixes and ASes 61.0.0.0/8 4678 66.0.0.0/8 21562 82.0.0.0/8 8717 ~ 10 minutes Somewhere between 1-10% of all spam (some clearly intentional, others might be flapping)

  12. Why Such Big Prefixes? • Flexibility:Client IPs can be scattered throughout dark space within a large /8 • Same sender usually returns with different IP addresses • Visibility: Route typically won’t be filtered (nice and short)

  13. Spam Study: Major Findings • Where does spam come from? • Most received from few regions of IP address space • Insight about spammier prefixes could improve filters • Do spammers hijack routes? • A small set of spammers continually advertise short-lived routes • Traceability is not guaranteed • How is spam sent? • Most coming from Windows hosts (likely, bots) • Identification of spamming groups (e.g., botnets) could help

  14. Characteristics of spamming bots • Distribution across IP space for bots • Similar to IP space distribution for all spam • Lower bot activity in ranges where spam also comes from hijacked routes • Operating Systems of Spamming Hosts • ~ 95% run Windows • The 4% Unix-based hosts send up to 8% spam

  15. Most Bots Send Low Volumes of Spam Most bot IP addresses send very little spam, regardless of how long they have been spamming… Amount of Spam 99% of bots Lifetime (seconds)

  16. Most Bot IP addresses are quiet Percentage of bots 65% of bots only send mail to a domain once over 18 months Lifetime (seconds) Blacklists may want to target IP ranges, rather than individual IPs

  17. Take-Away Lessons Network-Level Spam Filtering • Network-level properties are less malleable, and are observable closer to the sourceof spam • Aggregate properties(e.g., IP prefix, ASN, route used etc.) may be more effective • Some network-level properties can be incorporated into spam filters • could be used as a first-pass filter Redefining End-Host Identifiers • Spam filtering requires a better notion of end-host identity • Securing the Internet routing infrastructureis key to traceabilty

  18. DNS-based Blacklists (DNSBLs) The most prevalent network-level spam filtering mechanism today Various criteria: open relays/proxies, virus senders, bad/unused address spaces etc. Hundreds of DNSBLs of all sizes How to measure the effectiveness of DNSBLs? Completeness Responsiveness What about DNS-Based Blacklists?

  19. Questions about DNSBLs • What is the completenessof the DNSBL? • What is the responsivenessof the DNSBL? • How many distinct domainsare targeted by a spamming host before it is blacklisted? • Does frequency of spam from a host change after it is blacklisted?

  20. Blacklisting: Completeness ~95% of bots listed in one or more blacklists Fraction of all spam received ~80% listed on average Only about half of the IPs spamming from short-lived BGP are listed in any blacklist Number of DNSBLs listing this spammer Spam from IP-agile senders tend to be listed in fewer blacklists

  21. Are IP-Based Blacklists Enough? • Mail Avenger is very aggressive • Eight different blacklists • Cloaking techniques complicate detection • For example, what if a bot could change IP addresses and remain reachable? • LAN agility • BGP agility

  22. A Model of Responsiveness • Response Time • Difficult to calculate without “ground truth” • Can still estimate lower bound Possible Detection Opportunity Infection Time S-Day RBL Listing Response Time Lifecycleof a spamming host

  23. Measuring Responsiveness • Data • 1.5 days worth of packet captures of DNSBL queries from a mirror of Spamhaus • 46 days of pcaps from a hijacked C&C for a Bobax botnet; overlaps with DNSBL queries • Method • Monitor DNSBL for lookups for known Bobax hosts • Look for first query • Look for the first time a query response had a ‘listed’ status

  24. Responsiveness: Preliminary Results • Observed 81,950 DNSBL queries for 4,295 (out of over 2 million) Bobax IPs • Only 255 (6%) Bobax IPs were blacklisted through the end of the Bobax trace (46 days) • 88 IPs became listed during the 1.5 day DNSBL trace • 34 of these were listed after a single detection opportunity Both responsiveness and completeness appear to be low.Much room for improvement.

  25. Domains Performing Lookups • Over 60% are queried by just one IP/AS • Hypothesis: Decreased chances of being reported

  26. So…What can be Done? • Network-level behavior of spammers • Ultimate goal: Construct spam filters based on network-level properties, rather than content • Content-based properties are malleable • Low cost to evasion: Spammers can easily alter content • High admin cost: Filters must be continually updated • Content-based filters are applied at the destination • Too little, too late: Wasted network bandwidth, storage, etc. • Study of DNS-based blacklists • “Discovery”:One of the mosttelling network-level properties is botnet membership • DNSBL Counter-Intelligence • Network monitoring

  27. Mitigation #1: Counter-Intelligence • Botmasters advertise spamming bots for which bots are not listed in any blacklist. • Insight:Someone must be looking up the bots! • Can we fish out these DNSBL “reconnaissance” queries and identify subjects/targets as suspect?

  28. Legitimate queriers are also the targets of queries Reconnaissance queriers are ususally not queried themselves Legit Queries vs. Reconnaissance DNS-BasedBlacklist DNS-BasedBlacklist lookupmx.b.com lookupmx.a.com Legit Mail Server Amx.a.com Legit Mail Server Bmx.b.com email to mx.b.com email to mx.a.com Reconnaissance host

  29. Measurement Approach • Log Spamhaus queries • Construct querier/queried graph • Prune graph: only nodes in the Bobax trace • Examine nodes with high out-degree • Hypothesis: targets of nodes with high out-degree likely bots

  30. Who’s Doing the Lookups? • The botmaster, on behalf of the bots • The bots, on behalf of themselves • The bots, on behalf of each other Known bobax drone! Spam Sinkhole Implication: Use a “seed” set to bootstrap?

  31. Some Problems with Counter-Intel • Constructing the query graph is intensive • Computationally • Storage-wise • Initially pruning the graph with IP addresses of known suspects (e.g., spammers) could help

  32. Mitigation: Network Monitoring • In-network filtering • Requires the ability to detect botnets • Question: Can we detect botnets by observing communication structure among hosts? Example: Migration between command and control hosts New type of problem: essentially coupon collectionHow good are current traffic sampling techniques at exposing these patterns?

  33. Experimental Setup

  34. (Preliminary) Results Feasible sampling rates Conventional sampling techniques are not well-suited to collecting conversations

  35. Summary Lessons • Network-level spam filtering holds promise • Potentially a useful complement to content-based filters • Today’s DNSBLs aren’t doing the tricks • Two critical pieces • Monitoring techniques (which might be used together) • In-network (e.g., with better traffic monitoring techniques) • At the edge (e.g., DNSBL reconnaissance) • Routing security • “Clean-slate” wish list • Better notions of identity • More agile monitoring/sampling techniques

More Related