1 / 80

Leveraging Propagation for Data Mining Models, Algorithms & Applications

Leveraging Propagation for Data Mining Models, Algorithms & Applications. B. Aditya Prakash Naren Ramakrishnan. August 10, Tutorial, SIGKDD 2016, San Francisco. About us. B. Aditya Prakash Asst. Professor CS, Virginia Tech. PhD. CMU, 2012. Data Mining, Applied ML

pancho
Download Presentation

Leveraging Propagation for Data Mining Models, Algorithms & Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Leveraging Propagation for Data MiningModels, Algorithms & Applications B. Aditya Prakash NarenRamakrishnan August 10, Tutorial, SIGKDD 2016, San Francisco

  2. About us • B. Aditya Prakash • Asst. Professor • CS, Virginia Tech. • PhD. CMU, 2012. • Data Mining, Applied ML • Graph and Time-series mining • Applications to Social Media, Epidemiology/Public Health, Cyber Security • Homepage: http://www.cs.vt.edu/~badityap/ Prakash and Ramakrishnan 2016

  3. About us • NarenRamakrishnan • Thomas L. Phillips Prof. • CS, Virginia Tech. • PhD. Purdue, 1997. • Data mining • for intelligence analysis, forecasting, sustainability, and health informatics • Homepage: http://people.cs.vt.edu/naren/ Prakash and Ramakrishnan 2016

  4. Tutorial webpage • http://people.cs.vt.edu/~badityap/TALKS/16-kdd-tutorial/ • All Slides will be posted there. • Talk video as well (later). Prakash and Ramakrishnan 2016

  5. Networks are everywhere! Facebook Network [2010] Gene Regulatory Network [Decourty 2008] Human Disease Network [Barabasi 2007] The Internet [2005] Prakash and Ramakrishnan 2016

  6. Dynamical Processes over networks are also everywhere! Prakash and Ramakrishnan 2016

  7. Why do we care? • Social collaboration • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology • ........ Prakash and Ramakrishnan 2016

  8. Why do we care? (1: Epidemiology) • Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks Prakash and Ramakrishnan 2016

  9. Why do we care? (1: Epidemiology) • Dynamical Processes over networks • Each circle is a hospital • ~3000 hospitals • More than 30,000 patients • transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? Prakash and Ramakrishnan 2016

  10. Why do we care? (1: Epidemiology) ~6x fewer! [US-MEDICARE NETWORK 2005] CURRENT PRACTICE OUR METHOD Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash and Ramakrishnan 2016

  11. Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash and Ramakrishnan 2016

  12. Why do we care? (2: Online Diffusion) • Dynamical Processes over networks Buy Versace™! Followers Celebrity Social Media Marketing Prakash and Ramakrishnan 2016

  13. Why do we care? (3: To change the world?) • Dynamical Processes over networks Social networks and Collaborative Action Prakash and Ramakrishnan 2016

  14. High Impact – Multiple Settings epidemic out-breaks Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? products/viruses transmit s/w patches Prakash and Ramakrishnan 2016

  15. Research Theme ANALYSIS Understanding POLICY/ ACTION Managing DATA Large real-world networks & processes Prakash and Ramakrishnan 2016

  16. Research Theme – Public Health ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? DATA Modeling # patient transfers Prakash and Ramakrishnan 2016

  17. Research Theme – Social Media ANALYSIS # cascades in future? POLICY/ ACTION How to market better? DATA Modeling Tweets spreading Prakash and Ramakrishnan 2016

  18. In this tutorial Given propagation models, on arbitrary networks: Q1: What is the epidemic threshold? Q2: How do viruses compete? With extensions to dynamic networks, multi-profile networks etc. Fundamental Models Understanding Prakash and Ramakrishnan 2016

  19. In this tutorial Q3: How to estimate and learn influence and networks? Q4: How to immunize and control out-breaks better? Q5: How to reverse-engineer epidemics? Q6: How to leverage viral marketing? Q7: How to pick sensors for graphs? Algorithms Managing/Manipulating Prakash and Ramakrishnan 2016

  20. In this tutorial How to use propagation for _________ Q8: Memes, Tweets, Blogs Q9: Disease Surveillance Q10: Protest Trends Q11: Malware Attacks Q12: General Graph Mining Applications Large real-world networks & processes Prakash and Ramakrishnan 2016

  21. Plan • Three breaks! • 2-2:05pm • 3-3:30pm (conference coffee break) • 4:15-4:20pm • Part 2: Algorithms starts at roughly 1:50pm • Part 3: Applications at 3:30pm (after the coffee break) • Please interrupt anytime for questions Prakash and Ramakrishnan 2016

  22. Outline • Motivation • Part 1: Understanding Epidemics (Theory) • Part 2: Policy and Action (Algorithms) • Part 3: Applications (Data-Driven) • Conclusion Prakash and Ramakrishnan 2016

  23. Part 1: Theory • Q1: What is the epidemic threshold? • Q2: How do viruses compete? Prakash and Ramakrishnan 2016

  24. A fundamental question Strong Virus Epidemic? Prakash and Ramakrishnan 2016

  25. example (static graph) Weak Virus Epidemic? Prakash and Ramakrishnan 2016

  26. Problem Statement # Infected above (epidemic) below (extinction) time Separate the regimes? Find, a condition under which • virus will die out exponentially quickly • regardless of initial infection condition Prakash and Ramakrishnan 2016

  27. Threshold (static version) Problem Statement • Given: • Graph G, and • Virus specs (attack prob. etc.) • Find: • A condition for virus extinction/invasion Prakash and Ramakrishnan 2016

  28. Threshold: Why important? • Accelerating simulations • Forecasting (‘What-if’ scenarios • Design of contagion and/or topology • A great handle to manipulate the spreading • Immunization • Maximize collaboration ….. Prakash and Ramakrishnan 2016

  29. Part 1: Theory • Q1: What is the epidemic threshold? • Background • Result and Intuition (Static Graphs) • Proof Ideas (Static Graphs) • Bonus: Dynamic Graphs • Q2: How do viruses compete? Prakash and Ramakrishnan 2016

  30. Background “SIR” model: life immunity (mumps) • Each node in the graph is in one of three states • Susceptible (i.e. healthy) • Infected • Removed (i.e. can’t get infected again) Prob. β Prob. δ t = 1 t = 2 t = 3 Prakash and Ramakrishnan 2016

  31. Background Terminology: continued • Other virus propagation models (“VPM”) • SIS : susceptible-infected-susceptible, flu-like • SIRS : temporary immunity, like pertussis • SEIR : mumps-like, with virus incubation (E = Exposed) ….…………. • Underlying contact-network – ‘who-can-infect-whom’ Prakash and Ramakrishnan 2016

  32. Background Related Work • All are about either: • Structured topologies (cliques, block-diagonals, hierarchies, random) • Specific virus propagation models • Static graphs • R. M. Anderson and R. M. May. Infectious Diseases of Humans. Oxford University Press, 1991. • A. Barrat, M. Barthélemy, and A. Vespignani. Dynamical Processes on Complex Networks. Cambridge University Press, 2010. • F. M. Bass. A new product growth for model consumer durables. Management Science, 15(5):215–227, 1969. • D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C. Faloutsos. Epidemic thresholds in real networks. ACM TISSEC, 10(4), 2008. • D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010. • A. Ganesh, L. Massoulie, and D. Towsley. The effect of network topology in spread of epidemics. IEEE INFOCOM, 2005. • Y. Hayashi, M. Minoura, and J. Matsukubo. Recoverable prevalence in growing scale-free networks and the effective immunization. arXiv:cond-at/0305549 v2, Aug. 6 2003. • H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42, 2000. • H. W. Hethcote and J. A. Yorke. Gonorrhea transmission dynamics and control. Springer Lecture Notes in Biomathematics, 46, 1984. • J. O. Kephart and S. R. White. Directed-graph epidemiological models of computer viruses. IEEE Computer Society Symposium on Research in Security and Privacy, 1991. • J. O. Kephart and S. R. White. Measuring and modeling computer virus prevalence. IEEE Computer Society Symposium on Research in Security and Privacy, 1993. • R. Pastor-Santorras and A. Vespignani. Epidemic spreading in scale-free networks. Physical Review Letters 86, 14, 2001. • ……… • ……… • ……… Prakash and Ramakrishnan 2016

  33. Part 1: Theory • Q1: What is the epidemic threshold? • Background • Result and Intuition (Static Graphs) • Proof Ideas (Static Graphs) • Bonus: Dynamic Graphs • Q2: How do viruses compete? Prakash and Ramakrishnan 2016

  34. How should the answer look like? ….. • Answer should depend on: • Graph • Virus Propagation Model (VPM) • But how?? • Graph – average degree? max. degree? diameter? • VPM – which parameters? • How to combine – linear? quadratic? exponential? Prakash and Ramakrishnan 2016

  35. Static Graphs: Our Main Result • Informally, • For, • any arbitrary topology (adjacency • matrix A) • any virus propagation model (VPM) in • standard literature • the epidemic threshold depends only • on the λ,firsteigenvalueof A,and • some constant , determined by the virus propagation model λ • No epidemic if λ * < 1 In Prakash+ ICDM 2011

  36. Our thresholds for some models s = effective strength s < 1 : below threshold

  37. Our result: Intuition for λ “Official” definition: “Un-official” Intuition  λ ~ # paths in the graph • Let A be the adjacency matrix. Then λ is the root with the largest magnitude of the characteristic polynomial of A [det(A – xI)]. • Doesn’t give much intuition! u u ≈ . (i, j) = # of paths i j of length k Prakash and Ramakrishnan 2016

  38. Largest Eigenvalue (λ) better connectivity higher λ λ ≈ 2 λ = N λ = N-1 λ ≈ 2 λ= 31.67 λ= 999 N = 1000 N nodes Prakash and Ramakrishnan 2016

  39. Examples: Simulations – SIR (mumps) Fraction of Infections Footprint (a) Infection profile (b) “Take-off” plot PORTLAND graph 31 million links, 6 million nodes Effective Strength Time ticks

  40. Examples: Simulations – SIRS (pertusis) Fraction of Infections Footprint (a) Infection profile (b) “Take-off” plot PORTLAND graph 31 million links, 6 million nodes Time ticks Effective Strength

  41. Part 1: Theory • Q1: What is the epidemic threshold? • Background • Result and Intuition (Static Graphs) • Proof Ideas (Static Graphs) • Bonus: Dynamic Graphs • Q2: How do viruses compete? Prakash and Ramakrishnan 2016

  42. Proof Sketch General VPM structure Model-based λ * < 1 Graph-based Topology and stability Prakash and Ramakrishnan 2016

  43. Models and more models Prakash and Ramakrishnan 2016

  44. Ingredient 1: Our generalized model Endogenous Transitions Endogenous Transitions Susceptible Susceptible Infected Infected Exogenous Transitions Vigilant Vigilant Endogenous Transitions Prakash and Ramakrishnan 2016

  45. Special case: SIR Susceptible Infected Vigilant Prakash and Ramakrishnan 2016

  46. Special case: H.I.V. “Non-terminal” “Terminal” Multiple Infectious, Vigilant states Prakash and Ramakrishnan 2016

  47. Details Ingredient 2: NLDS + Stability size N (number of nodes in the graph) S • Probability vector Specifies the state of the system at time t . . . size mNx 1 I V . . . . . • View as a NLDS • discrete time • non-linear dynamical system (NLDS) Prakash and Ramakrishnan 2016

  48. Details Ingredient 2: NLDS + Stability Non-linear function Explicitly gives the evolution of system . . . size mNx 1 . . . . . • View as a NLDS • discrete time • non-linear dynamical system (NLDS) Prakash and Ramakrishnan 2016

  49. Ingredient 2: NLDS + Stability • View as a NLDS • discrete time • non-linear dynamical system (NLDS) • Threshold  Stability of NLDS Prakash and Ramakrishnan 2016

  50. Details Special case: SIR S S size 3Nx1 I I R R = probability that node iis not attacked by any of its infectious neighbors NLDS Prakash and Ramakrishnan 2016

More Related