1 / 52

CS 1944: Sophomore Seminar Big Data and Machine Learning

Explore the world of data mining, discovering patterns, models, and insights from large datasets. Understand the significance of data manipulation, analysis, and communication in extracting value. Dive into various data mining tasks and applications in diverse fields like biology, physics, social science, and economics. Stay updated on the latest research themes in public health, social media, and online diffusion dynamics.

kathryng
Download Presentation

CS 1944: Sophomore Seminar Big Data and Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 1944: Sophomore SeminarBig Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

  2. About me • Assistant Professor, CS • Member, Discovery Analytics Center • Previously • Ph.D. in Computer Science, Carnegie Mellon University • B.Techin Computer Science and Engg, Indian Institute of Technology (IIT) – Bombay • Internships at Sprint, Yahoo, Microsoft Research

  3. Data contains value and knowledge

  4. Data and Business Source: A. Machhanavajjhala

  5. Data and Science

  6. Data and Government Source: A. Machhanavajjhala

  7. Data and Culture Source: A. Machhanavajjhala

  8. Good news: Demand for Data Mining

  9. How to extract value from data? • Manipulate Data • CS, Domain expertise • Analyze Data • Math, CS, Stat… • Communicate your results • CS, Domain Expertise

  10. Communication is important!

  11. What is Data Mining? • Given lots of data • Discoverpatterns and models that are: • Valid:hold on new data with some certainty • Useful:should be possible to act on the item • Unexpected: non-obvious to the system • Understandable:humans should be able to interpret the pattern

  12. Data Mining Tasks • Descriptive methods • Find human-interpretable patterns that describe the data • Example: Clustering • Predictive methods • Use some variables to predict unknown or future values of other variables • Example: Recommender systems

  13. Biology • Physics • Theory & Algo. • Big data • Comp. Systems • Social Science • ML & Stats. • Econ.

  14. Data at CS, VT • Knowledge, Information and Data • http://www.cs.vt.edu/undergraduate/tracks/kid • People: Fox, Harrison, Huang, Lu (in NVA), Ramakrishnan(in NVA), Rozovskaya, Prakash

  15. Courses • Background in some areas: • CS3414 (Numerical Methods); also prob/stat • 4000 level • 4244 Internet Software Development • 4604 Database Management Systems • 4624 Capstone (Multimedia, Information Access) • 4634 Design of Information (Capstone) • 4804 AI • 4984 Computational Linguistics (Capstone)

  16. Discovery Analytics Center

  17. My Research

  18. Networks are everywhere! Facebook Network [2010] Gene Regulatory Network [Decourty 2008] Human Disease Network [Barabasi 2007] The Internet [2005]

  19. What else do they have in common?

  20. High School Dating Network Bearman et. al. Am. Jnl. of Sociology, 2004. Image: Mark Newman Blue: Male Pink: Female Interesting observations?

  21. The Internet Skewed Degrees Robustness

  22. Karate Club Network

  23. Dynamical Processes over networks are also everywhere!

  24. Why do we care? • Social collaboration • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology ........

  25. Why do we care? (1: Epidemiology) • Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks

  26. Why do we care? (1: Epidemiology) • Dynamical Processes over networks • Each circle is a hospital • ~3000 hospitals • More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize?

  27. Why do we care? (1: Epidemiology) ~6x fewer! [US-MEDICARE NETWORK 2005] CURRENT PRACTICE OUR METHOD Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year)

  28. Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users

  29. Why do we care? (2: Online Diffusion) • Dynamical Processes over networks Buy Versace™! Followers Celebrity Social Media Marketing

  30. Social  Biological Contagion Automatically learn models Prakash 2014

  31. Why do we care? (3: To change the world?) • Dynamical Processes over networks Social networks and Collaborative Action

  32. High Impact – Multiple Settings epidemic out-breaks Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? products/viruses transmit s/w patches

  33. Dynamical Processes = (a lot of) Networks + (some) Time-Series

  34. Research Theme ANALYSIS Understanding POLICY/ ACTION Managing DATA Large real-world networks & processes

  35. Research Theme – Public Health ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? DATA Modeling # patient transfers

  36. Research Theme – Social Media ANALYSIS # cascades in future? POLICY/ ACTION How to market better? DATA Modeling Tweets spreading

  37. A Question • How many of you think your friends have more friends than you?  • A recent Facebook study • Examined all of FB’s users: 721 million people with 69 billion friendships. • about 10 percent of the world’s population! • Found that user’s friend count was less than the average friend count of his or her friends, 93 percent of the time. • Users had an average of 190 friends, while their friends averaged 635 friends of their own.

  38. Possible Reasons? • You are a loner? • Your friends are extroverts? • There are more extroverts than introverts in the world?

  39. Example Average number of friends? Source: S. Strogatz, NYT 2012

  40. Example Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2 Source: S. Strogatz, NYT 2012

  41. Example Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2 Average number of friends of friends Source: S. Strogatz, NYT 2012

  42. Example Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2 Average number of friends of friends = (3 + 1 + 2 + 2 + 3 + 2 + 3 + 2)/8 = ((1x1) + (3x3) + (2x2) + (2x2))/8 Source: S. Strogatz, NYT 2012

  43. Example Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2 Average number of friends of friends = (3 + 1 + 2 + 2 + 3 + 2 + 3 + 2)/8 = ((1x1) + (3x3) + (2x2) + (2x2))/8 = 2.25! Source: S. Strogatz, NYT 2012

  44. Actually it is (almost) always true! • Proof?

  45. Actually it is (almost) always true! • Proof?

  46. Actually it is (almost) always true! • Proof?

  47. Actually it is (almost) always true! • Proof?

  48. Actually it is (almost) always true! Essentially, it is true if there is any spread in # of friends (non-zero variance)! • Proof?

More Related