1 / 45

Large-Scale Network Dynamics: A New Frontier

Large-Scale Network Dynamics: A New Frontier. Jie Wang Dept of Computer Science University of Massachusetts Lowell. Presented at Dept. of Computer Science, Boston University, Nov. 6, 2009 At Dept. of Computer Science, University of Texas at Dallas, Oct. 30, 2009

fausta
Download Presentation

Large-Scale Network Dynamics: A New Frontier

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large-Scale Network Dynamics: A New Frontier Jie Wang Dept of Computer Science University of Massachusetts Lowell Presented at Dept. of Computer Science, Boston University, Nov. 6, 2009 At Dept. of Computer Science, University of Texas at Dallas, Oct. 30, 2009 At Dept. of Electrical and Computer Engineering, Michigan State Univ., Sept. 24, 2009

  2. “The earth to be spann’d, connected by network, The races, neighbors, to marry and be given in marriage, The oceans to be cross’d, the distant brought near, The lands to be welded together” Walt Whitman (1819 - 1892), Passage to India “The network is the computer” John Gage (1942 - ), Sun Microsystems “The network is the information and the storage” Weibo Gong, UMass Amherst

  3. Small-World Phenomenon What is your Erdős number? Six degrees of separation Two persons are linked if they are coauthors of an article. The Erdős number is the collaboration distance with mathematician Paul Erdős. Erdös number  0  ---      1 person      Erdös number  1  ---    504 people      Erdös number  2  ---   6593 people      Erdös number  3  ---  33605 people      Erdös number  4  ---  83642 people      Erdös number  5  ---  87760 people      Erdös number  6  ---  40014 people      Erdös number  7  ---  11591 people      Erdös number  8  ---   3146 people      Erdös number  9  ---    819 people      Erdös number 10  ---    244 people      Erdös number 11  ---     68 people      Erdös number 12  ---     23 people      Erdös number 13  ---      5 people The median Erdös number is 5; the mean is 4.65, and the standard deviation is 1.21

  4. Small-World Networks The Watts-Strogatz -Model between order and randomness - Short mean path; or short characteristic path - Large clustering coefficient

  5. What Are Big-World Networks? Acquaintance Networks over Generations From “Mathematics Genealogy Project” Gottfried Leibniz (1646-1716) Jacob Bernoulli (1654-1705) Johann Bernoulli (1667-1748) Leonhard Euler (1707-1783) Joseph Lagrange (1736-1813) Simeon Poisson (1781-1840) Michel Chasles (1793-1880) H. A. Newton (1830-1896) E. H. Moore (1862-1932) Oswald Veblen (1880-1960) Gerald Sacks (1933 -) 343 academic descendants John B. Rosser (1907-1989) Alonzo Church (1903-1995) Stephen Homer Jie Wang

  6. Scale-Free Phenomenon Power law distribution: f(x) ~ x–α Log-log scale: log f(x) ~ –αlog x Scale-free networks are small-wolrd Small-world may not be scale-free Subnets of scale-free networks may not be scale-free

  7. Brain Networks “A mental state M is nothing other than brain state B. The mental state "desire for a cup of coffee" would thus be nothing more than the "firing of certain neurons in certain brain regions.” -- E. G. Boring (1886-1968)

  8. Are Brain Networks Small-World? There are 100 billion (1011) neurons in the human brain, and 100 trillion (1014) connections (synapses) Brian networks are highly dynamic Can process 100 trillion instructions per second Some believe brain networks are small-world Mathematical challenge:Work out a mathematical model consistent with brain functionalities

  9. Connecting the Dots Networks are connected dots “You can't connect the dots looking forward; you can only connect them looking backwards.” Steven Jobs (1955 -)

  10. Infectious Disease SpreadingHow Were Dots Connected? Sept 12 – Sept 19, 2009 Sept 19 – Sept 26, 2009 Sept 26 – Oct 03, 2009 Oct 03 – Oct 10, 2009 Oct 10 – Oct 17, 2009 Sept 05 – Sept 12, 2009

  11. How Will the Dots Be Connected? Dynamic connections are not deterministic, nor random. But they have patterns and trends. Statistical analysis is like connecting the dots backward, while predicting disease spread is like connecting the dots forward …

  12. A Simple Relational Model: The SIR Dynamics An 8-acquaitance node under SIR • Structure-biased k-acquaintance model • Homophily: the tendency to associate with people like yourself • Symmetry: undirected links • Triad closure: the tendency of one’s acquaintances to also be acquainted • with each other

  13. Structure-Biased Spread

  14. A Mathematical Model of Spread Prediction

  15. Mathematical Epidemiology • Most mathematical methods study differential equations based on simplified assumptions of uniform mixing or ad hoc contact processes • Example:

  16. Percolation and Outbreak • Large-scale graphs based on scale-free and small-world models are common platforms to study epidemics • Individuals (sites) are connected by social contacts (bonds) • Each site is susceptible with probability p and each bond is open with probability q, indicating infectiousness • A percolation threshold exists for phase transition of disease spread • When both p and q are high, a cluster of infectious sites connected by open bonds will permeate the entire population, resulting in an outbreak • Otherwise, infectious clusters will be small and isolated

  17. Percolation Threshold Demo q = 0.51 q = 0.578 q = 0.2 65 x 65 grid

  18. Modeling Challenges • Population and demographics • urban, suburban, rural, mobility • income, age, gender, education, religion, culture, ethnic background, household size • Social contact pattern • household, work, study, shopping, entertainment, travel, medical activities, … • dense and frequent local contacts; sparse and occasional long-distance contacts • Infection process • disease characteristics: infectious speed & recovery levels • people's general health level and vaccination history • frequency and duration of contacts It seems difficult to address these challenges using mathematical methods alone B. Liu and J. Wang et al

  19. Computational Methods • Simulations with contingent parameters • Modeling disease outbreaks in realistic urban social networks (S. Eubank et al. Nature, 2004) • Understanding the spreading patterns of mobile phone viruses (P. Wang et al., Science, 2009) BT susceptible phones within the range of an infected BT phone will all be infected. An MMS virus can infect all susceptible phones whose numbers are in the phonebook of an infected phone

  20. Mobile Networks and OSes Location, mobility, and communication pattern dynamics

  21. Online Social Networks (OSNs) • Topological dynamics • temporal attribute of node and edge arrivals and departures • explain why the mean degree and characteristic path length tend to be stable over time, while density and scale do not • Communication dynamics • friendships vs. activities • Mobility dynamics • GPS-enabled smartphones • location-based applications G. Chen, B. Liu, J. Wang et al

  22. The Rise of OSNs • 1997: SixDegrees allowed users to create profiles, list and surf and friend lists • 1997-2001: a number of community tools support profile and friend lists, AsianAvenue, BlackPlanet, MiGente, LiveJournal • 2001 - present : business and professional social network emerged, Ryze, LinkedIn • 2003: MySpace attracts teens, bands, among others and grows to largest OSN • 2004: Facebook designed for college networking (Harvard), expanded to other colleges, high schools, and other individuals

  23. Common OSNs

  24. OSNs Go Mobile • Location aware • GPS-enabled phones, sharing current location, availability, attaching location to user-generated content • Outlook • anticipated $3.3 billion revenue by 2013 • Dodgeball, Loopt, Brightkite, Whrrl, Google Latitude, Foursquare

  25. PageRank for Measuring Page Popularity Just walk at random? Biased Random Walks

  26. Association Rank for Friendship Prediction G. Chen and J. Wang et al

  27. Startup in 2005, Denver, CO; opened to public: 2008 • User activities • Check in, status update, photo upload • All attached with current location • Updates through SMS, Email, Web, iPhone … • Social graph with mutual connection • See your friends’ or local activity streams

  28. Data Trace Brightkite Web APIs 12/9/08-1/9/09: 18,951 active users Back traced to 3/21/08: 1,505,874 updates Profile: age, gender, tags, friends list Social graph: 41,014 nodes and 46,172 links Testing data: next 45 days had 5,098 new links added G. Chen and N. Li

  29. Snapshots taken from 12/09/08 to 01/09/09

  30. Three Attributes to Measure Community Rank Tags Social Distance Location

  31. Probability Measure

  32. Tag Graph Metric

  33. Social Distance

  34. Location Metric

  35. Community Rank ValueIndicating the likelihood of friendship

  36. ROC Curve

  37. MySpace • Launched in Santa Monica, CA, in 2003 • Grew rapidly and attracted Friendster’s users, bands, … • Teenagers began joining en masse in 2004 • Three distinct populations began to form: • musicians/artists • teenagers • post-college urban social crowd • Purchased by News Corporation for $580M in 2005 • Arguably the largest online social network site

  38. MySpace Profile and Activities • Each profile: age, gender, location, last login time, etc; identified by a unique ID • Some profiles claim neutral gender, e.g, bands • Profiles can be set to private (default is public) • What can users do? • search and add friends to their friend lists • post messages to friend’s blog space • Only friends have access to private profile’s friend list and blog space • Other functions: IM/Call, Block/Rank User, Add to Group favorite

  39. Measurement: SnailCrawler • Generate random IDs uniformly between 1 and max (1,500,000,000) • Many IDs are not occupied (invalid) • Retrieve profile information from MySpace (HTTP) • name, ID, gender, age, location, public/private/custom • other information for public profiles: company, religion, marriage, children, smoke/drink, orientation, zodiac, education, ethnicity, occupation, hometown, body-type, mood, last login, … W. Gauvin, B. Liu, X. Fu, J. Wang et al

  40. Data Trace • People of 16 years old or younger are protected by law • Teenagers and twenties post most blogs • False ages at 98-100 years old • Among teenagers 16-19, female publish more than male • After 20, no significant differences; often male publish more than female • Scanned:3,090,016 • Blogs: 67,045

  41. Blog publish time (on special days) Christmas Valentine’s day Feb Sept Dec • females publish more than males, and male more than neutral • spikes on holidays, e.g., Valentine’s day, Christmas

  42. Blog publish time (month & week) Sun Mon Jan Dec Sun Sat • females publish more than males • more blogs posted May to Oct • slightly more blogs posted during weekdays

  43. Blog publish time (within a day) • big jump at 1 pm • people tend to publish from afternoon well into mid-night • peak around 10pm, bottom around 5am

More Related