320 likes | 440 Views
Understanding the Network-Level Behavior of Spammers. Author: Anirudh Ramachandran, Nick Feamster SIGCOMM ’ 06, September 11-16, 2006, Pisa, Italy Presenter: Tao Li. Questions. What IP ranges send the most spam?
E N D
Understanding the Network-Level Behavior of Spammers Author: Anirudh Ramachandran, Nick Feamster SIGCOMM ’06, September 11-16, 2006, Pisa, Italy Presenter: Tao Li
Questions • What IP ranges send the most spam? • Common spamming modes? How much spam comes from botnets versus other techniques? (open relays, short-lived route announcements) • How persistent across time each spamming host is? • Characteristics of spamming botnets?
Motivation • 17-month trace over 10 million spam messages at “spam sinkhole” • Joint analysis with IP-based blacklist lookups, passive TCP fingerprinting info, routing info, botnet “C&C” traces • To find the network-level properties to design more robust network-level spam filters.
Outline • Background Information • Data Collection • Data Analysis • Network-level Characteristics of Spammers • Spam from Botnets • Spam from Transient BGP Announcements • Discussion
Outline • Background Information • Data Collection • Data Analysis • Network-level Characteristics of Spammers • Spam from Botnets • Spam from Transient BGP Announcements • Discussion
Spamming Methods • Direct spamming • Buy connectivity from “spam-friendly” ISPs • Open relays and proxies • Allow unauthenticated hosts to relay email • Botnets • Infected hosts as mail relay • BGP Spectrum Agility • Hijack send spam withdrawal routes
Mitigation techniques • Content filter • Continually update filtering rules • large corpuses for training • Spammers easy to change content • Blacklist lookup • Stolen IP address to send spam • Many bot IP addresses are short-lived
Outline • Background • Data Collection • Data Analysis • Network-level Characteristics of Spammers • Spam from Botnets • Spam from Transient BGP Announcements • Discussion
Spam Email Traces • “Sinkhole” corpus domain 8/5/2005—1/6/2006 • No legitimate email addresses • DNS Main Exchange (MX) record • Run Mail Avenger—SMTP sever • IP address of the relay • A traceroute to that IP address • A passive “p0f” TCP fingerprinting—OS • Result of DNS blacklist (DNSBL) lookups
Spam Email Traces • Number of spam and distinct IP address rising
Data Collection • Legitimate Email Traces • 700,000 legitimate form a large email provider • Botnet Command and Control Data • A trace of hosts infected by “Bobax” • Hijacked authoritative DNS server running the C&C of the botnet, redirect it to a honeypot , • BGP Routing Measurements • Colocate a BGP monitor in the same network as “sinkhole”
Outline • Background • Data Collection • Data Analysis • Network-level Characteristics of Spammers • Spam from Botnets • Spam from Transient BGP Announcements • Discussion
Network-level Characteristics of Spammers • Distribution Across Networks • Distribution across IP address space • Distribution across ASes • Distribution by country • The Effectiveness of Blacklists
Distribution Across Networks • Distribution across IP address space • The majority of spam is from a relative small fraction of IP address space and the distribution is persistent.
Distribution Across Networks • About 85% of client IP addresses sent less than 10 emails to the sinkhole. • Important for spam filter design.
Distribution Across Networks • Distribution across ASes • Over 10% from 2 ASes; 36% from 20 ASes
Distribution Across Networks • Distribution by country • Although the top 2 ASes from which spam were received were from Asia, 11 of top 20 were from USA compromising 40% of all of the spam received from the top 20. • Assigning a higher level of suspicion according to an email’s country of origin maybe effective in filtering.
The Effectiveness of Blacklists • Nearly 80% relays in the 8 blacklists
The Effectiveness of Blacklists • Spamcop only lists 50% spam received • Blacklists have high false positive • Ineffective when IP address using more sophisticated cloaking techniques
Outline • Background • Data Collection • Data Analysis • Network-level Characteristics of Spammers • Spam from Botnets • Spam from Transient BGP Announcements • Discussion
Spam from Botnets • Bobax Topology • Spamming hosts and bobax drones have similar distribution across IP address space—much of the spam may due to botnets
Spam from Botnets • Operating Systems of Spamming Hosts • 4% not Windows; but sent 8% spam
Spam from Botnets • Spamming Bot Activity Profile • over 65% bot single shot, 75% of which less than 2 minutes
Spam from Botnets • Spamming Bot Activity Profile • Regardless of persistence, 99% of bots sent fewer than 100 pieces of spam
Outline • Background • Data Collection • Data Analysis • Network-level Characteristics of Spammers • Spam from Botnets • Spam from Transient BGP Announcements • Discussion
Spam from Transient BGP Announcements • BGP Spectrum Agility • A small but persistent group of spammers appear to send spam by • Advertising (hijacking) large blocks of IP address space (ie. /8s) • Sending spam from IP address scattered throughout that space • Withdrawing the route for the IP address space shortly after the spam is sent
Spam from Transient BGP Announcements • Announcement, withdrawal and spam from 61.0.0.0/8 and 82.0.0.0/8
Spam from Transient BGP Announcements • Prevalence of BGP Spectrum Agility • 1% spam from short-lived routes; but sometimes 10%
Outline • Background • Data Collection • Data Analysis • Network-level Characteristics of Spammers • Spam from Botnets • Spam from Transient BGP Announcements • Discussion
Contribution • Suggest using network-level properties of spammers as an addition to spam mitigation techniques • Quantify and document spammers using BGP route announcements for the first time • Present the first study examining the interplay between spam, botnets and the Internet routing infrastructure • Lots of useful findings according to network-level properties of spam
Weakness • Use only a small sample, not providing general conclusions about the Interne-wide characteristics • Only studied spam sent by Bobax drones • Data collection in the Botnet Command and Control Data, assuming host not patched and not use dynamic addressing during the course.
How to improve • Design a better notion of host identity • Detection techniques based on aggregate behavior • Securing the Internet routing infrastructure • Incorporating some network-level properties of spam into spam filters