Report on “ Spamming Botnets: Signatures and Characteristics ”

Report on “Spamming Botnets: Signatures and Characteristics ” Heyong Wang Department of Computer Science Iowa State University

Outline • Background • Related study and their limitations • Proposed solution • Experimental result and evaluation • Discussion

New World, New War! • Internet has greatly shaped our sociality • Increasing challenges: Internet Security!

Myth of Internet: Attack vs. Defense

Introduction-botnet • What is botnet? A group of compromised host computers that are controlled by a small number of commander hosts refer as Command and Control (C&C) server. Botnets have been widely used for networks attacks and spam emails sending at a large scale.

Botnet: one of top threats • Stealing data • Hosting fraudulent Web sites • Participating in DoS (denial of service) attacks • Sending spam emails ….

Is the Botnet Battle Already Lost? • According to statistics released by Symantec, an average of 57,000 active bots was observed per day over the first six months of 2006 [1] • "Bots are at the center of the undernet economy," says Jeremy Linden, until recently a researcher at Arbor Networks • Networks of bots distribute as much as 90 percent of all junk email, says David Dagon, a doctoral student at Georgia Tech who wrote his thesis on the topic

Is the Botnet Battle Already Lost? • According to SecureWorks, 20.6 million attacks originated from U.S. computers and 7.7 million from Chinese computers [2] • World: 6.23 million bot-infected computers on the Internet in 2007 [3] • China: 3.62 million in China’s address space in 2007 [3]

The goals of this paper • Perform a large scale analysis of spamming botnet characteristics • Identify spam botnet activity trends • Study future botnet detection and defense mechanism

How it works: an example IRC : Internet Relay Chat

Existing related work • The botnet infection and their associated control process have been studied and analyzed in [4, 5, 6] • Ramachandran el al. [7] perform a study of network behavior of large scale spammers, providing strong evidence that botnets are commonly used as platforms for sending spam. • Ramachandran el al proposed a way to infer membership and identify spammers by monitoring queries to DNSBL and by clustering email servers based on their target email destination domains[8]

Existing related work • Zhuang et al. showed that the similarity of email texts can help identify botnet-based spam campaigns [8]. • Li and Hsish found that spam emails with identical campaigns are highly clusterable and are often sent in a burst [9]. • The spam URL signatures generation problem is in many ways similar to the content-based worm signature generation problem that have been extensively studied [10, 11, 12, 13, 14].

However • how to correctly group those spam emails based on the campaigns has not yet discussed and studied • There are two challenges remaining to prevent directly adopting existing solutions for botnet spam signature generation • spammers add legitimate URLs to increase the perceived legitimacy of emails • spammers extensively use URL obfuscation techniques to evade detection

Proposed solution: AutoRE • AutoRE: a signature generation framework • Detect botnet-based spam emails and botnet membership • Does not require pre-classified training data • Output high quality regular expression signatures • AutoRE contains three components: • A URL preprocessor • A Group selector • A RegEx (Regular Expression) generator

AutoRE working mechanism AutoRE Modules and processing flow chart Algorithmic overview of generating polymorphic URL signature

URL Pre-Processing • Extracts information from given emails: • URL string • Source IP address • Email Sending time • Assign a unique ID to the extracted email • URL preprocessor partitions URLs into groups based on domain: Spam tends to advertise the same product or service from the same domain!

URL Group Selection • Email might be associated with multiple groups • Email contains multiple URLs pertaining to different domains • Group selector selects URL group if it is: • “bursty”: exhibits the strongest temporal correlation • “distributed”:Across a large set of distributed senders

Signature Generation andBotnet Identification • RegEx generates two types of signatures : • complete URL based signatures -- detect spam contains an identical URL • regular expression signatures -- detect spam contains polymorphic URLs • Botnet Identification must satisfy: • “distributed”: quantified by the total number of Autonomous Systems (ASes) • “bursty”: quantified using the inferred duration of a botnet spam campaign • “specific”: quantified using an information entropy metric

Automatic URL Regular Expression Generation • Keyword based signature tree construction • Candidate regular expressions generation • Detailing: returns a domain specific regular expression using a keyword-based signature as input • Generalization: returns a more general domain- agnostic regular expression by merging very similar domain-specific regular expressions • Ensure generated expression are specific enough • Measure the quality of a signature • Discard that are too general

Example: input URLs and the keyword-based signature tree construed by AutoRE Generalization: Merging domain-specific regular expressions into domain-agnostic regular expression

Result and evaluation • Dataset: • Randomly Sampled Hotmail emails in Nov 06, Jun 07, July 07 • Senders’ IP were not blacklisted • Number: 5,382,460 (sampling rate1:25000)

Result and evaluation con’t CU: complete URL based signatures RE: regular expression signatures

Result and evaluation con’t • False positive rate (FPR): • CU: 0.0001 to 0.0006, RE: 0.0011 to 0.0014 • Ability to detect future spam • URL signature detected 16% to 18% of spam RE signature much more robust for future detection • Regular Expression vs. Keyword Conjunction • RE reduce FPR by a factor of 10 to 30 • Domain-Specific vs. Domain-Agnostic signature • DA detect 9.9-20.6% more spam

Discussion: Limitations and some thoughts on proposed solution • Sampling rate (1: 25000) is insufficient to perform real-time experiments • Dataset was only from the Hotmail, result may not be applied to other email servers • May not work well if the spammer using URLs redirection techniques • Spammers may attempt to craft emails to evade the AutoRE URL selection process

Thank you 谢谢

References: [1] http://www.eweek.com/c/a/Security/Is-the-Botnet-Battle-Already-Lost/ [2]http://www.gcn.com/online/vol1_no1/47200-1.html [3] http://www.securityfocus.com/brief/827 [4] K. Chiang and L. Lloyd. A case study of the Rustock rootkit and spam bot. In The First Workshop in Understanding Botnets, 2007. [5] M. A. Rajab, J. Zarfoss, F. Monrose, and A. Terzis. A multifaceted approach to understanding the botnet phenomenon. In IMC ’06: Proceedings of the 6th ACM SIGCOMM conference on Internet measurement, 2006. [6] N. Daswani, M. Stoppelman, and the Google click quality and security teams. The anatomy of Clickbot.A. In The First Workshop in Understanding Botnets, 2007. [7] A. Ramachandran and N. Feamster. Understanding the network-level behavior of spammers. In Proceedings of Sigcomm, 2006. [8] A. Ramachandran, N. Feamster, and S. Vempala. Filtering spam with behavioral blacklisting. In Proceedings of the 14th ACM conference on computer and communications security, 2007.

References: [9] L. Zhuang, J. Dunagan, D. R. Simon, H. J. Wang, I. Osipkov, G. Hulten, and J. Tygar. Characterizing botnets from email spam records. In LEET 08: First USENIX Workshop on Large-Scale Exploits and Emergent Threats, 2008. [10] F. Li and M.-H. Hsieh. An empirical study of clustering behavior of spammers and group-based anti-spam strategies. In CEAS 2006: Proceedings of the 3rd conference on email and anti-spam, 2006. [11] S. Singh, C. Estan, G. Varghese, and S. Savage. Automated worm fingerprinting. In OSDI, 2004. [12] H.-A. Kim and B. Karp. Autograph: Toward automated, distributed worm signature detection. In the 13th conference on USENIX Security Symposium, 2004. [13] J. Newsome, B. Karp, and D. Song. Polygraph: Automatically generating signatures for polymorphic worms. In Proceedings of the 2005 IEEE Symposium on Security and Privacy, 2005. [14] J. Newsome, B. Karp, and D. Song. Polygraph: Automatically generating signatures for polymorphic worms. In Proceedings of the 2005 IEEE Symposium on Security and Privacy, 2005. [15] C. Kreibich and J. Crowcroft. Honeycomb: Creating intrusion detection signatures using honeypots. In 2nd Workshop on Hot Topics in Networks (HotNets-II), 2003.

Report on “ Spamming Botnets: Signatures and Characteristics ”