1 / 57

Detecting malware with graph-based methods: traffic classification, botnets, and Facebook scams

Detecting malware with graph-based methods: traffic classification, botnets, and Facebook scams. Michalis Faloutsos, U. New Mexico (moved from U.C. Riverside!). http://mypagekeeper.org. Key Thesis of this Talk. Graph-mining and Network Science enable revolutionary security techniques

barnettj
Download Presentation

Detecting malware with graph-based methods: traffic classification, botnets, and Facebook scams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detecting malware with graph-based methods:traffic classification, botnets, and Facebook scams Michalis Faloutsos, U. New Mexico (moved from U.C. Riverside!) http://mypagekeeper.org

  2. Key Thesis of this Talk Graph-mining and Network Science enable revolutionary security techniques • We develop methods for network malware detection • We use it to detect Malware on Social Networks • New frontier: Many new problems have emerged 2

  3. This talk • Part I: Graph-based techniques for network security • Part II: Detecting malware in Social Networks • Part III: Some new projects Aristotle http://mypagekeeper.org

  4. What we get with existing tools What we want • Ideally, we want a method to • profile ALL the traffic (Recall) • have high profiling accuracy (Precision) Unknown! Traffic profiling results using deep packet inspection (data are from a peering link between two ISP is the US) Who is using/attacking my network:Profiling traffic is not a solved problem

  5. GraphWare:A graph-based approach to network monitoring • We monitor traffic as a network-wide phenomenon • Beyond packet, flow statistics and host profiling Based on MariosIliofotou PhD at UC Riverside Papers IMC07, CONEXT09, GI09, INFOCOM10, CONEXT10 Collaborators: PrashanthPappu, Sumeet Singh (Cisco) M. Mitzenmacher (Harvard), G. Varghese UCSD, T. Eliassi-Rad (LLNL/Rutgers), B. Gallagher (LLNL), Traffic Dispersion Graphs

  6. Capturing Network Context:Traffic Dispersion Graphs • Traffic Dispersion Graphs: Who talks to whom • Deceptively simple definition • Defining what constitutes an edge allows for focused “slices” • Enables powerful visualization and novel algorithms Virus traffic in blue

  7. Roadmap of this part • Previous work and background • Overview of our graph-based solutions • Developing graph-based methods • Traffic classification: Profiling By Association • Botnet detection 8

  8. Port Numbers Payload Signatures Flow Statistics Simplest approach Flow level Packet level Flow padding Encryption Use random ports or legacy ports I. Why previous methods fail? Packet and Flow level • Packet and Flow level are not the answer What existing profilers use: How apps evade

  9. IMC’07 Deploying our graph-based approach Step 1: Determine what is the question of interest Step 2: Define the appropriate graphs (TDGs) • Define what to monitor: i.e. Track all UDP flows, all flows at ports 10-300, flows with >10 packets Step 3: Use the right metrics to capture the right properties Step 4: Visualize results or take action GraphWare

  10. CONEXT10 infocom10 GI’09 ASIACCS’14 COMNET CONEXT’09 Networking13 Using graphs enables many novel solutions! We addressed many different problems • Detect P2P traffic at the Internet core • Extract information from graph evolution • Classify obfuscated traffic • Exploit community structure using clustering • Profiling By Association (PBA) • Detect botnets: ENTELECHEIA • Detect malicious website scanners

  11. P2P online game SMTP (email) Profiling By Association:The key insights Defaults Insight 1: Some traffic is easy to profile Insight 2: Traffic of apps exhibits homophily

  12. Initial Seed Information “Profile By Association” NetworkTraffic Profiled NetworkTraffic Phase BInference Phase ASeeding Use only connectivity 18 The Profiling-By-Association Framework (PBA) Generate graph 15

  13. + x 0.5 x 0.5 Approach 1: Profiling By AssociationThe Neighboring Link Classifier (NLC) • Uses local structure of the graph • Classify an edge based on its neighbors V1 V2 web u web

  14. P2P SMTP (email) online game Approach 2: Profiling By Association (HYP)The HYP algorithm – using clusters Uses global structure Two main steps: • Identify clusters • Exploit seeds to profile clusters [HYP from hyper-graph] Known email servers Known P2P Known gamers Clustering: The Louvain method by Blondel et al. outperformed other methods

  15. Evaluation on four backbone traces • Seeding configurations • Randomly selected X% of IPs • Intentionally causing errors • Seeding using existing profilers • BLINC, Coral Reef (in the paper) • Evaluation • Using 3, 5, 10min intervals • Averaged over 20 runs • Small standard deviation

  16. Accuracy This trace has more hosts with multipleapplications (high NAT usage) 1% of hosts as seeds Both our algorithms do pretty well! Accuracy = correctly labelled/all labelled Here we label all flows HYP is more robust to the specifics of a trace

  17. How much seeding info do we need? >1% • 0.1% seeding info is workable (~85% accuracy) • 1% of seeding info is sufficient (>90% accuracy) • 10% is great! (>95%) Baseline: we just know the seeding information 20

  18. P2P SMTP (email) online game What if we fake the connectivity? Imagine: Hacker adds fake edges by a factor of k 22

  19. 26 HYP is robust to edge obfuscation (20-200x) BRAZ trace • Add links from P2P hosts towards other apps • Fake = k * Existing 20x 200x k 23

  20. Specific Application: Can you find botnets? Botnets: groups of compromised end-user machines that communicate with each other and launch attacks together (DDoS, Email Spam)

  21. Detecting bots using graphs Problem: detect bots within enterprise Challenging requirements: 1. in their waiting stage (dormant, more difficult) 2. in the absence of payload signatures 3. Peer-to-peer: decentralized without a botmaster Previous efforts fail in at least one of the requirements Huy Hang, UCR T. Eliassi-Rad, Rutgers

  22. ENTELECHEIA: detecting bots in a network Insight: botnet flows should be long-lived and low-intensity Question: Is this enough to detect them? Answer: Not as is. Solution: we need to redefine flows (SuperFlows) ENtrap Treacherous ELEments through Clustering Hosts Exhibiting Irregular Activities Defn: the state of a thing when its essence is fully realized (Aristotle) Nugache botnet Regular traffic Storm botnet Volume vs Duration of flow: Botnet flows are different

  23. Key novelty: Introduce Superflows instead of flows • Superflow are groups of common flows • Consider any packet for the same pair of nodes • Irrespective of port number, or protocol • Flows that are close in time.

  24. 3 Initial results are very promising: >96% F1-score Flows Thresh ENTELECHEIA ENTELECHEIA: F1-score higher than 96% Real traces injected with real Storm and Nugache traffic Just using thresholds for volume and duration fails (5tuple, 2tuple) Reference solutions: Flows (5-tuple), Tresh (2-tuple):

  25. We released GraphWare v.1.0:www.cs.ucr.edu/~hangh/graphware.html • Based on Python and the GUESS framework • Supports: graph metrics, comparisons, clustering

  26. Related Publications • “Network monitoring using Traffic Dispersion Graphs (TDGs).” In ACM IMC, 2007. (AR 21%). • "Graph-based P2P traffic classification at the Internet backbone.” In IEEE Global Internet, 2009. (AR 34%). • “Exploiting dynamicity in graph-based traffic analysis: Techniques and applications.” In ACM CoNEXT, 2009. (AR 17%). • “Homophily in application-layer and its usage in traffic classification.” Brian Gallaghe, M. Iliofotou, T. Eliassi. M. Faloutsos. IEEE INFOCOM mini, 2010. (AR 24%) • “Profiling-by-association: A resilient traffic profiling solution for the Internet backbone.” To appear in ACM CoNEXT 2010. (AR 19%). • “Graption: A Graph-based P2P traffic classification framework for the Internet backbone.” Computer Networks by Elsevier 2011. • “Entelecheia: Detecting P2P Botnets in their Waiting Stage.” Huy Hang, Xuetao Wei, Michalis Faloutsos. Tina Eliassi-Rad. IFIP Networking 2013, May 2013. • “Scanner Hunter: Understanding HTTP scanning traffic.” Guowu Xie, Huy Hang, Michalis Faloutsos. Accepted to ASIACCS 2014, Kyoto, Japan. 30

  27. USENIX Security’12 Part II: Detecting Malware on Facebook Collaborators: Sazzadur Rahman, Ting-Kai Huang Harsha V. Madhyastha UC Riverside Bruno Ribeiro, (Umass/CMU) CONEXT’12 WWW’13

  28. Introducing Socware The “Get a free subway” scam on my wall Socware = SOCial malWARE Malicious, annoying, parasitic activities

  29. The Dark Side of Facebook • We need new solutions for socware • Our prediction: it is going to get worse Hi Malicious post Malicious apps

  30. Should we care? Yes. “Facebook is the new web” bmw.com facebook.com/bmw VS

  31. MyPageKeeper: Our Facebook Appapps.facebook.com/mypagekeeper

  32. MyPageKeeper does the job apps.facebook.com/mypagekeeper • MyPageKeeper, 20K installs, monitors 3M wall • It is efficient, scalable socware detection method • 0.005% false positive, 3% false negative • Monitors every 2 hours from the cloud • Some key observations • 49% users exposed to a malicious post in 4 months We are big in Japan

  33. Existing malware solutions: not enough • URL Blacklists detect only 3.5% of bad posts • Remaining 96% caught by our ML-based logic • 26% malicious URL point to facebook.com Performance comparison with blacklist

  34. The rise of the AppNet • Socware is enabled by Facebook apps! • 44% of campaigns are enabled by Facebook applications

  35. Apps cross-promote directly App1 post Points to App2 Facebook terms forbid this!

  36. Apps cross-promote indirectly: Highly sophisticated “fast-flux” App1 post We identified 103 URLs doing redirections! External website with redirector Javascript App4 App2 App3

  37. Our Solution FRAppE:Facebook’s Rigorous App Evaluator App ID • FRAppE Lite, user-side • Use features crawled on-demand • No. of permissions required by an app • Domain reputation of redirect URI • Uses Support Vector Machines • FRAppE, OSN-centric • Addition of aggregation-based features: • Similarity of app names • Whether posted links are external • FRAppE has 99% detection accuracy FRAppE Malicious Benign

  38. Some scary interesting results • 13% of apps in our dataset of 111K distinct apps are malicious • 60% of malicious apps endanger more than 100K users each (click on link) • 40% of malicious apps have over 1,000 monthly active users each We found 800 malicious apps that Facebook missed!

  39. AppNets: large collaborative groups • App Collaboration graph • 44 connected components • Largest connected component 3,484 apps • High connectivity • 70% of apps collude with more than 10 other apps • High density • 25% of apps have local clustering coefficient more than 0.74 Real snapshot of 770 highly collaborating apps Promoter Promotee

  40. Our anti-Socware work • “An Analysis of Socware Cascades in Online Social Networks”, Ting-Kai Huang , Md Sazzadur Rahman, Harsha Madhyastha and Michalis Faloutsos, World-Wide Web Conference (WWW’13), 2013 • "FRAppE: Detecting Malicious Facebook Applications", Md Sazzadur Rahman, Ting-Kai Huang, Harsha Madhyastha and Michalis Faloutsos, ACM CoNEXT'12, Nice, France, December 2012. • "Efficient and Scalable Socware Detection in Online Social Networks", Md Sazzadur Rahman, Ting-Kai Huang, Harsha Madhyastha and Michalis Faloutsos, USENIX Security, 2012.

  41. Conclusion • We need analysis of large, evolving graphs • Network security needs to see the big picture • Graph-mining needed: fast, accurate, customizable • We need new solutions to detect socware • Existing blacklists and anti-spam won’t work • Malicious apps form colluding networks (AppNets)

  42. Key Research Areas at UNM CS Human-Centric Security Adaptive Biological Systems Data Science and Visualization Computing In the Large

  43. My Research Directions Social Media analytics and security Securing smartphones and embedded dev. Web-based malware

  44. Human-Centric Security Center “Technologies for securing people”™ • Mission: • Secure personal information • Protect privacy and digital freedom • Empower people thru awareness, control and choice • Existing projects • Internet Censorship: choke points (Prof. Crandall) • Social Network tools to protect people • Privacy awareness and tools (Prof. Kelley) • Provably robust and privacy-preserving distr. Systems (Prof. Saia)

  45. New projects • PeerApp: Early warning for risky behavior • Behavior: depression, suicide, addiction, bullying • Using OSNs data to detect • Privacy-Panel: know what you share • Inform users what they share and with whom • PrivateNet: private and anonymized OSN • None can reverse engineer contacts or content • CommentDigest: info from user comments • Securing embedded devices

  46. PeerApp: Detecting risk behavior • Problem: detect risk behavior early • Solution: Leverage modern technology • Harness: social net information • Collect smartphone information: passive/active • Predict and prevent problems: • Notify appropriate person

  47. PeerApp: An ambitious agenda Collect and mine social data Collect and mine bio data Develop theories of risk behavior Provide open platform for social studies

  48. PeerApp: Key Novelties • Completeness: • From genes to brain to behavior to policy • Privacy sensitive: • Provide warnings, not incriminating evidence • In-vivo pseudo-real-time information: • Collect and act Just-In-Time • Provide blueprint for Social Studies: • Methods and tools to harness new tech

  49. PeerApp: The team • Multidisciplinary team • Psychiatry: S. Feldstein-Ewing (UNM) T. Chung (U. Pitt) • Bioinformatics: V. Calhoun (MIND-UNM) • Data mining: C. Faloutsos (CMU), M. Abdullah (UNM) • Network Science: K. Pelechrinis (U. Pitt) M. Faloutsos (UNM) • Proposal and papers in progress

  50. PrivateNet: Free Digital Speech • Goal: fully anonymous and private communications • Key: no one can reverse engineer • Content and pattern of communication • How: combination of • End to end encryption • Obfuscation • Collaborators: • S. Krishnamurthy, H. Madhyastha UCR

More Related