1 / 49

Behavior Prediction and Anomaly Detection in Large-Scale Social Networks

Behavior Prediction and Anomaly Detection in Large-Scale Social Networks. Meng Jiang 蒋 朦 www.meng-jiang.com. Social Networks. Large-scale 117M nodes, 3.33B edges in Jan. 2011 355M nodes in Nov. 2011 Millions of tweets per day Relational

ponce
Download Presentation

Behavior Prediction and Anomaly Detection in Large-Scale Social Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Behavior Prediction and Anomaly Detectionin Large-Scale Social Networks Meng Jiang 蒋 朦 www.meng-jiang.com

  2. Social Networks • Large-scale • 117M nodes, 3.33B edges in Jan. 2011 • 355M nodes in Nov. 2011 • Millions of tweets per day • Relational • Directed/undirected/bipartite/hyper- graph: link prediction • Heterogeneous • User-user link (social relation) • User-item link (tweet, post, social label, video, interest group…) • Complex behavior intentions • Normal/abnormal use: for information/money

  3. Research Problems:Intentions and Links User-item link User-user link Normal use Anomalous use

  4. Research Problems:Intentions and Links User-item link User-user link Topic-level influence [Liu et al. CIKM 2010] Normal use Anomalous use

  5. Research Problems:Intentions and Links User-item link User-user link User diversity [Lu et al. Multimed Tools Appl 2012]  Normal use Anomalous use

  6. Research Problems:Intentions and Links User-item link User-user link Normal use Anomalous use

  7. Behavior Prediction& Social Recommendation • Problem • Too many messages are generated and received every minute. • How to recommend posts/rank feeds in social networks? • Can we predict what the users will click/retweet/share next? • Problem definition • Given large amount of data, predict missing user-item links (missing values in user-item matrix).

  8. Challenges and Possible Solution Too big! High Sparsity tweet Retweet /share #user user #tweet

  9. Challenges and Possible Solution • Can we use other existing links to help? How? Too big! High Sparsity tweet Retweet /share #user user #tweet user tweet user Interaction frequency Semantic similarity Social relation user tweet user

  10. Challenges and Possible Solution • Yes! Understand user intentions! Receive the message What is the content? Who is the sender? Share or not share… Preference Influence + tweet user tweet user Retweet /share Interaction frequency Semantic similarity Social relation user user tweet user

  11. Research Problems:Intentions and Links User-item link User-user link Social contextual recommendation [Jiang et al. CIKM 2012]  Normal use Anomalous use

  12. Scalable Social Recommendation • Problem • How about new coming users and new coming tweets? • Can we fast give the answer with previous results? • Problem definition (new coming users) tweet Retweet /share user+Δuser tweet user+Δuser Interaction frequency Semantic similarity Social relation user user+Δuser tweet user+Δuser ? Δuser

  13. Scalable Social Recommendation • Problem • How about new coming users and new coming tweets? • Can we fast give the answer with previous results? • Problem definition (new coming tweets) tweet Δtweet tweet+Δtweet user user Retweet /share ? Semantic similarity Social relation Interaction frequency tweet+Δtweet user user user

  14. Research Problems:Intentions and Links User-item link User-user link Scalable recommendation with social context [Jiang et al. TKDE 2014]  Normal use Anomalous use

  15. Cold-start Problem • Problem • We have solved (new users, old items) and (old users, new items). How about (new users, new items)? • Sorry… tweet Δtweet √ √ user √ ? Δuser

  16. Social Recommendation • We have auxiliary knowledge in other domains. • User label domain Choose < 10 from 200+ labels like ‘iPhone fan’ Peng Cui Haidian, Beijing Company: Tsinghua User labels (5) Tsinghua, Ph.D., World Wide Web, Social Network, Social Media User labels (9) Chinese food, World Wide Web, Social Network, Data Mining, Liverpool Football Club, NBA, Humors, Sports, Ph.D. Candidates Meng Jiang Haidian, Beijing University: Tsinghua

  17. Social Recommendation • We have auxiliary knowledge in other domains. • Interest group domain Interest Groups (3) Interest Groups (2) Tsinghua University Tsinghua University Social Media & Reputation World Wide Web Team I love sing!

  18. Social Recommendation • How to construct social network with multiple domains? • We have user-post, user-label and user-group links. • No relations between item domains. No post-label links in nature. • Stronger social relations can help collaborate user-item links. • More collaborating in user-item links can strengthen the social relations. web posts web posts users users ? ? ? user labels user labels

  19. Research Problems:Intentions and Links User-item link User-user link Cross-domain social recommendation [Jiang et al. CIKM 2012] Normal use Anomalous use

  20. Good to Transfer? More Insights! • If we do transfer (from user-label domain), we need only ~30% to reach the same performance. • Build more features to let new users provide more info! 0 user-tweet 100% user-label 35% user-tweet 60% user-tweet 18% user-tweet 100% user-label

  21. Human Behavior Pattern • Problem • Two basic characteristics of human behavior • Multi-faceted Keyword/Topic Affiliation/Institute/University Write a paper Author/Researcher

  22. Human Behavior Pattern • Problem • Two basic characteristics of human behavior • Multi-faceted Happy birthday! Phone Text Photo Post a WeChat message Location WeChat user

  23. Human Behavior Pattern • Problem • Two basic characteristics of human behavior • Multi-faceted • Evolutionary Write a paper time

  24. Human Behavior Pattern • Problem • Two basic characteristics of human behavior • Multi-faceted • Evolutionary Post a WeChat message

  25. Human Behavior Pattern • Problem • Two basic characteristics of human behavior • Multi-faceted • Evolutionary • How to model human behavior? • Tensor sequence. • How to do pattern discovery and prediction? • Tensor decomposition and completion. time t3 t2 item t1 user

  26. Human Behavior Pattern • Challenges in high-order tensor decomposition • High sparsity • Write a paper: #author * #affiliation * #keyword • High complexity • Long sequence of large tensors • Slow: decomposition at each time time t3 t2 item t1 user

  27. Human Behavior Pattern • High sparsity: auxiliary knowledge as regularizers • Author - affiliation - keyword + co-authorship (author-author) • High complexity: update decomposition results (projection matrix) with new coming piece of data … item user user item time t3 t2 item time t1 item user user t1 t2 t3

  28. FEMA: Flexible Evolutionary Multi-faceted Analysis 0~t Δt 0~(t+Δt) X + ΔX item item user Tensor Perturbation Theory user × matricizing item cluster λ update core tensor user cluster X(1) user decompose user cluster ? X(2) A(1) item user projection matrix item cluster L(1) L(2) item A(2) item regularize user item

  29. Research Problems:Intentions and Links User-item link User-user link Behavior modeling and pattern discovery [Jiang et al. KDD 2014]  Normal use Anomalous use

  30. Research Problems:Intentions and Links User-item link User-user link Normal use Anomalous use

  31. Out-degree Distribution • Power-law distribution (directed graph) http://konect.uni-koblenz.de/networks/

  32. Out-degree Distribution • Power-law distribution (directed graph - social network) http://konect.uni-koblenz.de/networks/

  33. What We Have…

  34. Zombie Follower Detection • Challenges • Scalability: How to catch zombie followers from large graphs of millions of nodes and billions of edges? Can we explain the spikes on out-degree distributions?

  35. Zombie Follower Detection • Challenges • Scalability: How to catch zombie followers from large graphs of millions of nodes and billions of edges? Can we explain the spikes on out-degree distributions? • Camouflage: fake profile, no or little content, extra performance

  36. Zombie Follower Detection • Challenges • Scalability: How to catch zombie followers from large graphs of millions of nodes and billions of edges? Can we explain the spikes on out-degree distributions? • Camouflage: fake profile, no or little content, extra performance

  37. Zombie Follower Detection • Challenges • Scalability: How to catch zombie followers from large graphs of millions of nodes and billions of edges? Can we explain the spikes on out-degree distributions? • Camouflage: fake profile, no or little content, extra performance

  38. Zombie Follower Detection • Challenges • Scalability: How to catch zombie followers from large graphs of millions of nodes and billions of edges? Can we explain the spikes on out-degree distributions? • Camouflage: fake profile, no or little content, extra performance • Previous approaches Graph-based features Content-based features 0, 0, 0 sorry… classifier

  39. Compare Zombie Follower and Normal User • X = @Buy_AB22: a zombie follower with 20 followees • Y = a random user with 20 followees • Suspicious behavior: similar with each other, different from normal. X’s followees Y’s followees

  40. Do we catch the anomalies? • Twitter

  41. Do we catch the anomalies? • TencentWeibo

  42. Research Problems:Intentions and Links User-item link User-user link Zombie follower detection [Jiang et al. WWW 2014 Poster, KDD 2014]  Normal use Anomalous use

  43. Research Problems:Intentions and Links User-item link User-user link Dense bipartite core detection [Jiang et al. PAKDD 2014]  Normal use Anomalous use

  44. Research Problems:Intentions and Links User-item link User-user link Normal use Anomalous use

  45. Summary • Behavior Prediction • Social contextual recommendation [CIKM’12+TKDE’14] • Cross-domain social recommendation [CIKM’12] • Behavior discovery and prediction [KDD’14] • Anomaly Detection • Zombie follower detection [KDD’14] • Dense bipartite core detection [PAKDD’14] Good user-item links Bad user-user links

  46. Summary User-item link User-user link Normal use Anomalous use

  47. References • MengJiang, Peng Cui, Alex Beutel, Christos Faloutsos and Shiqiang Yang. CatchSync: Catching Synchronized Behavior in Large Directed Graphs. The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2014. • Meng Jiang, Peng Cui, Fei Wang, Xinran Xu, Wenwu Zhu and Shiqiang Yang. FEMA: Flexible Evolutionary Multi-faceted Analysis for Dynamic Behavioral Pattern Discovery. The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2014. • Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos and Shiqiang Yang. Inferring Strange Behavior from Connectivity Pattern in Social Networks.The 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2014. • Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos and Shiqiang Yang. Detecting Suspicious Following Behavior in Multimillion-Node Social Networks. The 23rd international conference on World Wide Web companion (WWW), 2014. (Poster)  • Meng Jiang, Peng Cui, Fei Wang, Wenwu Zhu and Shiqiang Yang. Scalable Recommendation with Social Contextual Information. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2014.  • Meng Jiang, Peng Cui, Rui Liu, Qiang Yang, Fei Wang, Wenwu Zhu and Shiqiang Yang. Social Contextual Recommendation. The 21st ACM International Conference on Information and Knowledge Management (CIKM), 2012. • Meng Jiang, Peng Cui, Fei Wang, Qiang Yang, Wenwu Zhu and Shiqiang Yang. Social Recommendation across Multiple Relational Domains. The 21st ACM International Conference on Information and Knowledge Management (CIKM), 2012. • Lu Liu, Feida Zhu, Meng Jiang, Jiawei Han, Lifeng Sun, Shiqiang Yang. Mining Diversity on Social Media Networks. Multimedia Tools and Applications, 2012. • Lu Liu, Jie Tang, Jiawei Han, Meng Jiang, Shiqiang Yang. Mining Topic-Level Influence in Heterogeneous Networks. The 19th ACM International Conference on Information and Knowledge Management (CIKM), 2010.

  48. Acknowledgements • Tsinghua University • Carnegie Mellon University Shiqiang Yang Wenwu Zhu Peng Cui Lu Liu • IBM T. J. Watson Research Center Christos Faloutsos Alex Beutel Fei Wang

  49. Thank you! Welcome to visit my homepage: http://www.meng-jiang.com ❤New friends ❤Discussions ❤Collaborations

More Related