520 likes | 533 Views
Explore the world of data mining, discovering patterns, models, and insights from large datasets. Understand the significance of data manipulation, analysis, and communication in extracting value. Dive into various data mining tasks and applications in diverse fields like biology, physics, social science, and economics. Stay updated on the latest research themes in public health, social media, and online diffusion dynamics.
E N D
CS 1944: Sophomore SeminarBig Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015
About me • Assistant Professor, CS • Member, Discovery Analytics Center • Previously • Ph.D. in Computer Science, Carnegie Mellon University • B.Techin Computer Science and Engg, Indian Institute of Technology (IIT) – Bombay • Internships at Sprint, Yahoo, Microsoft Research
Data and Business Source: A. Machhanavajjhala
Data and Government Source: A. Machhanavajjhala
Data and Culture Source: A. Machhanavajjhala
How to extract value from data? • Manipulate Data • CS, Domain expertise • Analyze Data • Math, CS, Stat… • Communicate your results • CS, Domain Expertise
What is Data Mining? • Given lots of data • Discoverpatterns and models that are: • Valid:hold on new data with some certainty • Useful:should be possible to act on the item • Unexpected: non-obvious to the system • Understandable:humans should be able to interpret the pattern
Data Mining Tasks • Descriptive methods • Find human-interpretable patterns that describe the data • Example: Clustering • Predictive methods • Use some variables to predict unknown or future values of other variables • Example: Recommender systems
Biology • Physics • Theory & Algo. • Big data • Comp. Systems • Social Science • ML & Stats. • Econ.
Data at CS, VT • Knowledge, Information and Data • http://www.cs.vt.edu/undergraduate/tracks/kid • People: Fox, Harrison, Huang, Lu (in NVA), Ramakrishnan(in NVA), Rozovskaya, Prakash
Courses • Background in some areas: • CS3414 (Numerical Methods); also prob/stat • 4000 level • 4244 Internet Software Development • 4604 Database Management Systems • 4624 Capstone (Multimedia, Information Access) • 4634 Design of Information (Capstone) • 4804 AI • 4984 Computational Linguistics (Capstone)
Networks are everywhere! Facebook Network [2010] Gene Regulatory Network [Decourty 2008] Human Disease Network [Barabasi 2007] The Internet [2005]
High School Dating Network Bearman et. al. Am. Jnl. of Sociology, 2004. Image: Mark Newman Blue: Male Pink: Female Interesting observations?
The Internet Skewed Degrees Robustness
Why do we care? • Social collaboration • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology ........
Why do we care? (1: Epidemiology) • Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks
Why do we care? (1: Epidemiology) • Dynamical Processes over networks • Each circle is a hospital • ~3000 hospitals • More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize?
Why do we care? (1: Epidemiology) ~6x fewer! [US-MEDICARE NETWORK 2005] CURRENT PRACTICE OUR METHOD Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year)
Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users
Why do we care? (2: Online Diffusion) • Dynamical Processes over networks Buy Versace™! Followers Celebrity Social Media Marketing
Social Biological Contagion Automatically learn models Prakash 2014
Why do we care? (3: To change the world?) • Dynamical Processes over networks Social networks and Collaborative Action
High Impact – Multiple Settings epidemic out-breaks Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? products/viruses transmit s/w patches
Dynamical Processes = (a lot of) Networks + (some) Time-Series
Research Theme ANALYSIS Understanding POLICY/ ACTION Managing DATA Large real-world networks & processes
Research Theme – Public Health ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? DATA Modeling # patient transfers
Research Theme – Social Media ANALYSIS # cascades in future? POLICY/ ACTION How to market better? DATA Modeling Tweets spreading
A Question • How many of you think your friends have more friends than you? • A recent Facebook study • Examined all of FB’s users: 721 million people with 69 billion friendships. • about 10 percent of the world’s population! • Found that user’s friend count was less than the average friend count of his or her friends, 93 percent of the time. • Users had an average of 190 friends, while their friends averaged 635 friends of their own.
Possible Reasons? • You are a loner? • Your friends are extroverts? • There are more extroverts than introverts in the world?
Example Average number of friends? Source: S. Strogatz, NYT 2012
Example Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2 Source: S. Strogatz, NYT 2012
Example Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2 Average number of friends of friends Source: S. Strogatz, NYT 2012
Example Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2 Average number of friends of friends = (3 + 1 + 2 + 2 + 3 + 2 + 3 + 2)/8 = ((1x1) + (3x3) + (2x2) + (2x2))/8 Source: S. Strogatz, NYT 2012
Example Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2 Average number of friends of friends = (3 + 1 + 2 + 2 + 3 + 2 + 3 + 2)/8 = ((1x1) + (3x3) + (2x2) + (2x2))/8 = 2.25! Source: S. Strogatz, NYT 2012
Actually it is (almost) always true! • Proof?
Actually it is (almost) always true! • Proof?
Actually it is (almost) always true! • Proof?
Actually it is (almost) always true! • Proof?
Actually it is (almost) always true! Essentially, it is true if there is any spread in # of friends (non-zero variance)! • Proof?