Understanding the Network-Level Behavior of Spammers

Understanding the Network-Level Behavior of Spammers By AnirudhRamachandran and Nick Feamster Defense Team: Mike Delahunty Bryan Lutz Kimberly Peng Kevin Kazmierski John Thykattil

Agenda • Introduction • Background and Related Work • Data Collection • Network-level Characteristics of Spammers • Spam from Botnets • Spam from Transient BGP Announcements • Lessons from Better Spam Mitigation • Conclusion

Introduction • Spam • Multiple emails sent to many recipients • Unsolicited commercial messages • Study based on network level behavior of spammers • IP address ranges • Spamming modes (route hijacking, bots, etc.) • Temporal persistence of spamming hosts • Characteristics of spamming botnets • Much attention has been paid to studying the content of spam

Introduction Cont. • Study posits that Network Level properties need to be investigated in order to determine creative ways to mitigate spam • Paper analyzes network properties of spam that is observed at a large spam “sinkhole” • BGP route advertisements • Traces of command and control messages of a Bobax botnet • Legitimate emails • Surprising Conclusions • Most spam comes from a small IP address space (but so does legitimate email) • Most spam comes from Microsoft Windows hosts – bots • Small set of spammers use short-lived route announcements to remain untraceable

Background • Methods and Mitigation • Spamming Methods • Direct Spamming – via spam friendly ISPs or dial-up IPs • Open Relays and Proxies – mail serves that allow unauthenticated to relay email • Botnets – hijacked machines acting under the control of centralized ‘botmaster’ • BGP Spectrum Agility – short-lived route announcements to the IP addresses from which they send spam; hampers traceability • Mitigation Techniques • Filtering: Content based and IP Blacklists

Related Work • Related Work – Previous Studies • Packet traces to determine bandwidth bottlenecks from spam sources • Project Honeypot • Sink for email traffic and hands out trap email addresses to determine harvesting behavior and identity of spammers • Time monitoring from harvesting to receipt of first spam message • Countries where harvesting infrastructure is located • Persistence of spam harvesters

Related Work Cont. • Mitigation • SpamAssassin Project – reverse engineering via mail content analysis • DNS blacklist – 80% of IPs sending spam were in the blacklist • Unusual Route Announcements • Bogus Well-Known addresses • Suggestions of short lived route announcements

Data Collection • Reserve a “sinkhole” • Registered domain with no legitimate email addresses • Establish a DNS Mail Exchange record for it. • All emails received by the server are spam • Run metrics on incoming emails • IP address of the relay; also run a traceroute • TPC fingerprint to get the source OS • Results of DNS blacklist from 8 different blacklist servers

Data Collection Cont. Spam received per day at sinkhole (Aug. 2004 – Dec. 2005)

Data Collection Cont. • “Hijack” the DNS server for the domain running a botnet • Have botnet commands go to a known machine instead. • Monitor the BGP update from the networks where the spams are received • Collect logs from large email provider (40 million mailboxes) • Allows analysis of network characteristics for spam and non-spam

Data Analysis • Study focuses on network level characteristics • Distribution of spam across IP address space is similar to legitimate emails (although not exact) • Spam over IP address range is not uniform • 12% of all received spam comes from two Autonomous Systems (AS) • 37% come from top 20 ASes. • Offers insight into spam prevention • Classifying spam by country: China, Korea, & US dominate • Defense suggestion • Correlate originating country with IP range to estimate probability of spam.

Cumulative Distribution Function (CDF) of Spam and Legitimate Email Greater probability of legitimate emails Big increase in probability of received spam

Spam Persistence 85% of unique spammers send 10 emails or less If this is true for all, what’s the value in filtering by a specific IP address?

Effectiveness of Blacklists About 80% of spam listed in at least one major blacklist

Effectiveness of Blacklists Cont. • Most spam bots are detected by at least one DNSRBL • Only 50% of spammers using transient BGP announcements detected by one DNSRBL

Spam from Botnets • Circumstantial evidence suggests that most spam originates from bots • Spamming hosts and Bobax drones have very similar distributions across IP address space • Suggests that much spam received may be due to botnets such as Bobax

More on Bots Most individual bots send low volume of spam individually

Operating Systems Used by Spammers • Used OS fingerprinting tool “p0f” in Mail Avenger • Able to identify OS of 75% of hosts that sent spam • Of this 75% identifiable segment, 95% run Windows • Consistent with percentage of hosts on Internet that run Windows • Only about 4% run other OS, but are responsible for 8% of received spam. • This goes against common perception that most spam originates from Windows botnet drones

Spam from Transient BGP Announcements • Some spammers briefly hijack large portions of IP address space (that do not belong to them), send spam, and withdraw routes immediately after spamming • Not much known, not well defended against • Very difficult to trace • Allows spammer to evade DNSRBLs • Used 10% or less of the time, as complementary spamming tactic

Lessons on Spam Mitigation • Why should we use network-level information? • Information is less malleable • More constant than spam email contents, which content-based filters monitor • Information is observable in the middle of the network • Closer to the source of the spam than other techniques • Will result in more effective spam filters • When combined with other techniques • Has potential to stop spam that other techniques miss

More Lessons • Improves knowledge of host identity • Bases detection techniques on aggregate behavior • Protects against route hijacking • “BGP spectrum agility” • Other techniques do not • Uses network-level properties to detect and filter

Conclusion • Studying the network-level behavior of spammers • Designing better spam filters with network-level filters • Network-level behavior filters vs. content-based filters • Should not replace content-based filters, but complement them

Questions?

Understanding the Network-Level Behavior of Spammers