250 likes | 379 Views
Populated IP Addresses — Classification and Applications. Chi-Yao Hong, UIUC Fang Yu, MSR Silicon Valley Yinglian Xie , MSR Silicon Valley. ACM CCS ( October, 2012). Outline. Introduction System Design Implementation Evaluation Application. Introduction.
E N D
Populated IP Addresses — Classification and Applications Chi-Yao Hong, UIUC Fang Yu, MSR Silicon Valley YinglianXie, MSR Silicon Valley ACM CCS (October, 2012)
Outline • Introduction • System Design • Implementation • Evaluation • Application A Seminar at Advanced Defense Lab
Introduction • While online services have become everyday essentials for billions of users, they are also heavily abused by attackers. • Web-based email • Online service providers often rely on IP addresses to perform blacklisting and service throttling. • For IP addresses that are associated with a large number of user requests, they must be treated differently. A Seminar at Advanced Defense Lab
Populated IP Addresses • We deffineIP addresses that are associated with a large number of user requests as Populated IP (PIP) addresses. • not equivalent to the traditional concept of proxies, NATs, gateways, or other middleboxes A Seminar at Advanced Defense Lab
Goal • In this paper, we introduce PIPMiner, a fully automated method to extract and classify PIPs. A Seminar at Advanced Defense Lab
System Design • We take a data-driven approach using service logs that are readily available to all service providers. • And we train a non-linear support vector machine (SVM) classifier that is highly tolerant of noise in input data. A Seminar at Advanced Defense Lab
System Flow • PIP Selection • Phase 1 : IP addresses with rL requests, rL= 1,000 • Phase 2: IP address has been used by at least uM accounts, together accounting for at least rM requests. • uM = 10, rM = 300 A Seminar at Advanced Defense Lab
Features • Population Featurescapture aggregated user characteristics. • Time Series Featuresmodel the detailed request patterns. • IP Block Level Features aggregate IP block level activities and help recognize proxy farms. A Seminar at Advanced Defense Lab
Population Features A Seminar at Advanced Defense Lab
Time Series Features A Seminar at Advanced Defense Lab
IP Block Level Features • large proxy farms often redirect trac to dierent outgoing network interfaces for load balancing purposes. • Determine neighboring IP addresses: • Neighboring IPs must be announced by the same AS. • Neighboring IPs are continuous over the IP address space, and each neighboring IP is itself a PIP. A Seminar at Advanced Defense Lab
EX: Block Level Time Series A Seminar at Advanced Defense Lab
Training and Classification • Non-linear SVM A Seminar at Advanced Defense Lab
Kernel Function k(xi, x) A Seminar at Advanced Defense Lab
Implementation • Data Parse and Feature Extraction (Stage 1) • We implement PIPMiner on top of DryadLINQ [link], a distributed programming model for large-scale computing. • Using a 240-machine cluster • Training and Testing (Stage 2) • Quad Core CPU with 8GB RAM • LIBSVM [link] and LIBLINEAR [link] toolkits A Seminar at Advanced Defense Lab
Evaluation • We apply PIPMiner to a month-long Hotmail login log pertaining to August 2010 and identify 1.7 million PIPaddresses. (200 MB ) • 0.5%of the observed IP addresses • the source of more than 20.1% of the total requests • Associated with 13.7% of the total accounts in our dataset • At Stage 1, PIPMinerprocesses a 296 GB dataset in only 1.5 hours. A Seminar at Advanced Defense Lab
PIP Score Distribution A Seminar at Advanced Defense Lab
PIP Address Distribution Dynamic IP Dynamic IP A Seminar at Advanced Defense Lab
Accuracy Evaluation • Among 1.7 million PIP addresses, 973K of them can be labeled based on the account reputation data. A Seminar at Advanced Defense Lab
Accuracy of Individual Componets A Seminar at Advanced Defense Lab
Accuracy against Data Length A Seminar at Advanced Defense Lab
Validation of Unlabeled Cases • Future Reputation • the reputation score of July 2011 (after 11 months) A Seminar at Advanced Defense Lab
Application • Windows Live ID Sign-up Abuse Problem • We focus on the sign-ups related to Hotmail and use the Hotmail reputation trace in July, 2011 (after 11 months) to determine whether a particular sign-up account was malicious or not. • We study the sign-up behavior on two types of the PIP addresses. • The first is the 1.7 million derived PIPs. • The second is the set of IP addresses that have more than 20 sign-ups from the Windows Live ID system, but they are not included in the 1.7 million PIPs. A Seminar at Advanced Defense Lab
Using PIPs to Predict User Reputation • Precision = 97% A Seminar at Advanced Defense Lab
Q & A Thank you for listening A Seminar at Advanced Defense Lab