1 / 31

Spamscatter :

Introduction. Characterizing Internet Scam Hosting Infrastructure. Spamscatter :. David S. Anderson, Chris Fleizach , Stefan Savage, and Geoffrey M. Voelker University of California, San Diego. Introduction. Motivation.

Mia_John
Download Presentation

Spamscatter :

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction Characterizing Internet Scam Hosting Infrastructure Spamscatter: • David S. Anderson, Chris Fleizach, • Stefan Savage, and Geoffrey M. Voelker • University of California, San Diego

  2. Introduction Motivation • 70 billion spam messages are sent everyday for a simple reason, advertising websites. • A scam then is any website marketed using spam • This online resource is directly implicated in the spam profit cycle, meaning it is rarer and more valuable • Characterizing the scam infrastructure helps • Reveal the dynamics and business pressures exerted on spammers • Identify means to reduce unwanted sites and spam

  3. Introduction Spamscatter Approach • Mine a large quantity of spam • Extract URLs • Probe machines hosting the scams • This works because URLs must be correct • Follow the scent of money… • All we need is a reliably large source of spam • We have access to a four letter, top level domain producing 150K spam per day

  4. Introduction Understanding scams Are scams distributed across different servers? Do different scams share the same server? How long do scams stay active? How reliable is their hosting? Where are scam servers located? Why is it useful to study these characteristics?

  5. Methodology Spamscatter and the Scam

  6. Methodology Methodology • Data collection • Extract links from large spam feed • Probe links every 3 hours for 7 days • Record browser redirection • Save screenshots • Analysis • Identify scams across servers and domains • Report on distributed and shared infrastructure, lifetime, stability, and location

  7. Methodology Identifying Scams • Goal: Identify multiple hosts in the same scam, since many scams are spread across different IPs and domain names • Naïve Approaches: • Correlate independent spam emails • Use HTML content returned from the webserver • Limitations: • Spam has too much chaff and obfuscation • HTML is uninteresting and mostly composed of images. • Web crawlers fail with frames, iframes and JavaScript

  8. Methodology Image Shingling • Solution: Use rendered screenshots of web pages for correlation. • How to compare upwards of 10,000 images? • Image shingling – based on text shingling idea [BRO97] • Fragment images into blocks and hash the blocks • Two images are similar if T% of the hashed blocks are the same (T=70-80%) • Shingling allows us to essentially compare all images in O(N lg N) • Resilient to small variations among images

  9. An Example Scam An Example Scam: “Downloadable Software” Scam Perspective • 99 observed virtual hosts • 3 IP addresses • Operated for months • 85 senders • No forwarding used • 5535 probes (97% successful)

  10. An Example Scam Clustering with Image Shingling • Images differ slightly • Some pages rotate content

  11. An Example Scam Location Blue – Web servers hosting Downloadable Software Red – Spam Relays – Hosts that sent us spam 2 Web servers in China; 1 Webserver in Russia 85 senders from 30 countries (28 from US)

  12. An Example Scam Shared Infrastructure • One of the IPs (221.4.246.3) hosting “Downloadable Software” was also hosting “Toronto Pharmacy” • Server located in Guangzhou, China

  13. Results Summary Statistics 1 week of spam collection – Nov. 28th – Dec. 4th 2 weeks of probing – Nov. 28th – Dec. 11th 1,087,711 Spam messages 30% contain links 319,700 11.3% are distinct links 36,390 7,029 19.3% resolve to unique IP addresses 2,334 33.2% resolve to distinct scams

  14. Results - Infrastructure Distributed InfrastructureTo what extent is the infrastructure distributed for scams? • Most scams are not distributed: • 94% used one IP • Top three distributed scams were extensive • 22, 30, and 45 IPs • Top three virtual-hosted scams • 110, 695, and 3029 domain names

  15. Results - Infrastructure Shared InfrastructureTo what extent do multiple scams share infrastructure? • 38% of scams hosted on a machine with at least one other scam • 10 IPs hosted 10 or more scams • Top three shared IPs • 15, 18, and 22 scams

  16. Results - Lifetime Scam Lifetime & StabilityHow long are scams active, and how reliable are the hosts? Scam webhosts seem to be taken down shortly after scams disappear Overall scam lifetime approached two weeks Reliability is high > 97% usually

  17. Results - Lifetime Spam campaign lifetime How long do spam campaigns last for a scam? • 137 spamsmessages per scam (Avg) • Most spam campaigns relatively short – 88% last 20 hours or less • Only 8% last more than 2 days • Scam lifetimes considerably longer – on average one week < 2 days < 20 hour

  18. Results - Location LocationWhere are scam hosting servers located? Blue – Web servers Red – Spam Relays

  19. Results - Location Location Web Servers Country Count Percent 1. usa5884 [57.40%] 2. chn741 [7.23%] 3. can 379 [3.70%] 4. gbr315 [3.07%] 5. fra 314 [3.06%] 6. deu258 [2.52%] 7. rus185 [1.80%] 8. kor181 [1.77%] Spam Relays Country Count Percent 1. usa 54159 [14.50%] 2. fra 26371 [7.06%] 3. esp25196[6.75%] 4. chn24833[6.65%] 5. pol21199 [5.68%] 6. ind20235 [5.42%] 7. deu18678 [5.00%] 8. kor17446 [4.67%]

  20. Results - Categorization Scam Categorization Scam category % of scams Uncategorized………………………………. 29.57% Information Technology………………… 16.67% Dynamic Content …………………………. 11.52% Business and Economy …………………. 6.23% Shopping ……………………………………… 4.30% Financial Data and Services ………….. 3.61% Illegal or Questionable …………………. 2.15% Adult ……………………………………………. 1.80% Message Boards and Clubs …………… 1.80% Web Hosting ………………………………… 1.63%

  21. Results - Categorization Lifetime of scams with Categorization More than 40% of malicious scams disappear before 120 hours Same is true for less than 15% of all scams

  22. Conclusion Summary • Started with over 1m spam messages and coalesced to fewer than 2,500 scams. • Image shingling allowed us to scalably determine if two sites were part of the same scam • Most scams use one web server (vulnerable to blacklisting) • Scams may use many virtual domains that point to one IP • Most scams not malicious per se • Scam infrastructure more stable, longer lived, concentrated in US, compared with spam senders

  23. Conclusion Spammers beware; These boffins are on the prowl Questions and Answers

  24. Supplementary Information Spamscope Visibility • Collected spam from news.admin.net-abuse.sightings – a newsgroup for contributing spam • For a 3 day period, we saw • 6,977 spam from the newsgroup  205 scams • 113,216 spam from our feed  1,687 • 12% of the newsgroup scams were in ours • The “largest” scams (most emails and most domains/IP) were seen in both feeds

  25. Results - Blacklisting Blacklists Host type Classification % of hosts Spam relay Open proxy 72.27% Spam host 5.86% Scam host Open proxy 2.06% Spam host 14.86% 9.7% of the scam hosts also sent us spam

  26. Supplementary Information Web Server OS 1 Linux recent 2.4 (1) 11.97% 2 Windows 2000 (SP1+) 11.05% 3 Akamai ??? 10.86% 4 Windows 2000 SP4 8.25% 5 Linux recent 2.4 (2) 7.84% 6 FreeBSD 4.6-4.8 7.72% 7 Slashdot or BusinessWeek 7.04% 8 FreeBSD 5.0 6.49% 9 Windows XP SP1 5.90% 10 Linux older 2.4 5.56%

  27. Supplementary Information URL Classification WISP Dynamic Content   17.931%WISP Uncategorized 13.965%WISP Illegal or Questionable 10.306%WISP Information Technology 9.051%WISP Shopping             4.872%WISP Business and Economy     4.733%WISP Financial Data and Services     4.626%WISP Personals and Dating       1.867%WISP Advertisements                     1.249%WISP Educational Institutions         1.247%WISP Pay-to-Surf                               1.022%WISP Search Engines and Portals               0.884%WISP Supplements and Unregulated Compounds 0.865%WISP Sex     0.862%

  28. Supplementary Information Image Clustering 1 week of spam collection – Nov. 28th – Dec. 4th 2 weeks of probing – Nov. 28th – Dec. 11th 2,541,486 Total probes 9.8% of probes result in a captured image 250,864 3.8% of screenshots are the 'first' screenshot for a scam 9572 Clusters detected by image shingling 2334

  29. Supplementary Information Image Shingling For a typical day of screenshots, we tested various thresholds A 70% threshold provided a good mixture between flexibility and accuracy

  30. Supplementary Information Overlap of pairs of scams on the same server For scams running on the same server, how much time do they overlap? • 96% of all scam pairs overlapped with each other when they remained active • Only 10% of scams fully overlapped each other One week

  31. Supplementary Information IP ranges What are the network locations of scams and spam relays? • The cumulative distribution of IP addresses is highly non-uniform • Majority of spam relays (60%) fall between 58.* -> 91.* • Most scams (50%) fall between 64.* -> 72.*

More Related