1 / 25

Populated IP Addresses — Classification and Applications

Populated IP Addresses — Classification and Applications. Chi-Yao Hong, UIUC Fang Yu, MSR Silicon Valley Yinglian Xie , MSR Silicon Valley. ACM CCS ( October, 2012). Outline. Introduction System Design Implementation Evaluation Application. Introduction.

quang
Download Presentation

Populated IP Addresses — Classification and Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Populated IP Addresses — Classification and Applications Chi-Yao Hong, UIUC Fang Yu, MSR Silicon Valley YinglianXie, MSR Silicon Valley ACM CCS (October, 2012)

  2. Outline • Introduction • System Design • Implementation • Evaluation • Application A Seminar at Advanced Defense Lab

  3. Introduction • While online services have become everyday essentials for billions of users, they are also heavily abused by attackers. • Web-based email • Online service providers often rely on IP addresses to perform blacklisting and service throttling. • For IP addresses that are associated with a large number of user requests, they must be treated differently. A Seminar at Advanced Defense Lab

  4. Populated IP Addresses • We deffineIP addresses that are associated with a large number of user requests as Populated IP (PIP) addresses. • not equivalent to the traditional concept of proxies, NATs, gateways, or other middleboxes A Seminar at Advanced Defense Lab

  5. Goal • In this paper, we introduce PIPMiner, a fully automated method to extract and classify PIPs. A Seminar at Advanced Defense Lab

  6. System Design • We take a data-driven approach using service logs that are readily available to all service providers. • And we train a non-linear support vector machine (SVM) classifier that is highly tolerant of noise in input data. A Seminar at Advanced Defense Lab

  7. System Flow • PIP Selection • Phase 1 : IP addresses with rL requests, rL= 1,000 • Phase 2: IP address has been used by at least uM accounts, together accounting for at least rM requests. • uM = 10, rM = 300 A Seminar at Advanced Defense Lab

  8. Features • Population Featurescapture aggregated user characteristics. • Time Series Featuresmodel the detailed request patterns. • IP Block Level Features aggregate IP block level activities and help recognize proxy farms. A Seminar at Advanced Defense Lab

  9. Population Features A Seminar at Advanced Defense Lab

  10. Time Series Features A Seminar at Advanced Defense Lab

  11. IP Block Level Features • large proxy farms often redirect trac to dierent outgoing network interfaces for load balancing purposes. • Determine neighboring IP addresses: • Neighboring IPs must be announced by the same AS. • Neighboring IPs are continuous over the IP address space, and each neighboring IP is itself a PIP. A Seminar at Advanced Defense Lab

  12. EX: Block Level Time Series A Seminar at Advanced Defense Lab

  13. Training and Classification • Non-linear SVM A Seminar at Advanced Defense Lab

  14. Kernel Function k(xi, x) A Seminar at Advanced Defense Lab

  15. Implementation • Data Parse and Feature Extraction (Stage 1) • We implement PIPMiner on top of DryadLINQ [link], a distributed programming model for large-scale computing. • Using a 240-machine cluster • Training and Testing (Stage 2) • Quad Core CPU with 8GB RAM • LIBSVM [link] and LIBLINEAR [link] toolkits A Seminar at Advanced Defense Lab

  16. Evaluation • We apply PIPMiner to a month-long Hotmail login log pertaining to August 2010 and identify 1.7 million PIPaddresses. (200 MB ) • 0.5%of the observed IP addresses • the source of more than 20.1% of the total requests • Associated with 13.7% of the total accounts in our dataset • At Stage 1, PIPMinerprocesses a 296 GB dataset in only 1.5 hours. A Seminar at Advanced Defense Lab

  17. PIP Score Distribution A Seminar at Advanced Defense Lab

  18. PIP Address Distribution Dynamic IP Dynamic IP A Seminar at Advanced Defense Lab

  19. Accuracy Evaluation • Among 1.7 million PIP addresses, 973K of them can be labeled based on the account reputation data. A Seminar at Advanced Defense Lab

  20. Accuracy of Individual Componets A Seminar at Advanced Defense Lab

  21. Accuracy against Data Length A Seminar at Advanced Defense Lab

  22. Validation of Unlabeled Cases • Future Reputation • the reputation score of July 2011 (after 11 months) A Seminar at Advanced Defense Lab

  23. Application • Windows Live ID Sign-up Abuse Problem • We focus on the sign-ups related to Hotmail and use the Hotmail reputation trace in July, 2011 (after 11 months) to determine whether a particular sign-up account was malicious or not. • We study the sign-up behavior on two types of the PIP addresses. • The first is the 1.7 million derived PIPs. • The second is the set of IP addresses that have more than 20 sign-ups from the Windows Live ID system, but they are not included in the 1.7 million PIPs. A Seminar at Advanced Defense Lab

  24. Using PIPs to Predict User Reputation • Precision = 97% A Seminar at Advanced Defense Lab

  25. Q & A Thank you for listening A Seminar at Advanced Defense Lab

More Related