540 likes | 729 Views
Social and Information Networks Theory and Practice. Anirban Dasgupta Isabelle Stanton. Topics. The Structure of Networks Small world networks Generative models The Long Tail Community Detection Cascades and Viral Processes Computation on Large Graphs Sampling and Surveying
E N D
Social and Information NetworksTheory and Practice AnirbanDasgupta Isabelle Stanton
Topics • The Structure of Networks • Small world networks • Generative models • The Long Tail • Community Detection • Cascades and Viral Processes • Computation on Large Graphs • Sampling and Surveying • Crowdsourcing • ….?
Coursework • 1 (group) Project • 2 Reaction Papers • 2-3 Experimental Assignments • Scribing
Office Hours • Isabelle – Soda 645 Time TBD • Anirban – By appointment http://cs294socialnetworks.org
Available Data Sets • Yahoo! Webscope data • Will be available in a few weeks • Social Network Crawls • LiveJournal, Twitter, Orkut, Flickr, YouTube, Facebook • SNAP archive • Citation Networks • HEP-th, dblp (over time), theory… • Physical Systems • Power grid, autonomous systems… • Web graphs • Notre Dame, Berkeley/Stanford, Wikipedia…
Complex Systems Around Us • We are surrounded by complex systems • Society is interaction of 7 billion individuals • Communication Systems (e.g. Internet) is formed by linking devices • Our cellsfunctionby interaction of proteins • Thoughts in our brain are formed by interactions of neurons • What are some common properties of these systems? How can we study them?
Why study Networks? Behind each of the complex systems, there is an underlying wiring diagram: the network We will never understand the complex system without understanding the network behind it Nodes: elements Links: interactions System: Graph/network
Network: Online Social Networks Nodes: members Links: “friend”
Network: Internet Nodes: routers Links: connections
Network: US power grid Nodes: power stations Links: power lines
Network: Economy Nodes: Companies Investment Pharma Research Labs Public Links: Collaborations Financial R&D http://ecclectic.ss.uci.edu/~drwhite/Movie
Network: Human Disease Nodes: Disease class Links: share gene
Network: Yeast Proteins Nodes: Proteins Links: chemical interaction
Network: Brain Human Brain has between 10-100 billion neurons. Nodes: neurons Links: connections
Network: US power grid Nodes: power stations Links: power lines
Without studying networks, we cannot …. • stop cascading outages in power-grids • forecast how disease spreadsin a society • design search engines like Google • understand how interaction of genomes create life • …
What do we study in networks? • Structure and evolution • How does a network look like? • How did it come to be like that? • Process and dynamics • Networks provide skeletons for information, for disease spreading, other dynamic processes
How would we study a network? • Empirical: Study network data to find out a particular principle • Data analysis, experiments, sociology surveys, … • Analyze: Is this principle surprising? How universal is this principle? • Statistics, probability, domain knowledge,… • Hypothesize: Build models that would explain the observed principle • Algorithms, graph theory, statistics, probability, domain knowledge…
Why now? • Data availability • Storage and computation are only getting cheaper • Massive amounts of data about human interaction • Universality • Networks arising from different fields of science and technology have surprisingly common properties • Shared Vocabulary • Statisticians, Cognitive Scientists, Physicists, Biologists, Computer Scientists,..
The story of“six degrees of separation”or“small world phenomenon”
Before there was the Internet • There were still social networks • How can we measure anything about them? • What do social networks look like? • How connected are we?
Milgram’s Experiment (1967) • Wanted to know about the global friendship network • If information is spreading through friends, how soon will it reach one particular person • Cannot really obtain the entire friendship network, so designed an experiment to find out this quantity Stanley Milgram
MA NE Milgram’s Experiment (1967) 300 people in midwest each given a letterTarget stockbroker in Boston Can only forward the letter to someone you know! Goal: Reach the target
Milgram’s Experiment: Results 300 people in midwest each given a letterTarget stockbroker in Boston Can only forward to someone you know! Total no. of chains 64 64 total Average number of steps 6.5 “six degrees of separation”
Six degrees of separation For almost all random pairs among 6 billion individuals There is a path with at most 6 steps
Experimental Problems • Selection bias • Starting points weren’t random but people who responded to an ad for ‘well-connected people’ • Highly disconnected groups aren’t sampled • Dropped chains • 232 of 296 never reached the target • 136 of 160 never reached the target • 16 of the 24 went through the same last hop
Was this a fluke? • Replicated by researchers using emails, Facebook • Similar property (short paths between pairs of nodes) also seen in other networks • protein-protein network, gene network • economic networks • language networks…
Six degrees of separation Is this surprising? • The average number of steps in chain was 6 • Why should there be 6 steps? • Hint: Suppose everyone has 100 friends, then? • But, your friends are friends among themselves !! Hermione Harry Ron
Small World Networks Alone these two properties aren’t very surprising. Together, they are. High ‘clustering’ Friends of my friends are likely to be my friends. Small diameter I have ~100 friends,who each have ~100 friends, and so on… So, I can reach everyone in s steps where 100s = n s = log(n)
Six degrees of separation • People do have moderately large (~100-1000) set of friends • But these friends typically occur in clusters • Everyone in a school, workplace, town… • In the presence of these properties, six degrees of separation is not obvious • Surprisingly, people can actually find the small paths…
The Small World concept in simple terms describes the fact despite their often large size, in most networks there is a relatively short path between any two nodes.
Why Study Small World property? • Purely scientific: • Why is there something this universal ? • Many very concrete applications: • Designing peer-to-peer systems (Napster, Gnutella), building computer networks • How to spread information with limited budget, say about an upcoming movie • How to stop spreading of viral infections?
How can we explain this? • What if we could hypothesize how networks are formed? • Basic intuition: models have to contain element of structured relation as well as random elements • Example, for social networks • structured friendships: college classmates • different interests: people have different groups of friends • random friendships: met on a train-ride • Still on ongoing area of research…
The Structure of Social Networks • Small diameter • Strongly connected (many short paths) • There exist highly connected people • High clustering coefficient • There are ‘short range’ and ‘long range’ edges • Local routing algorithms are successful What other types of networks have this property?
Erdős–Rényi Graphs • Classic random graph model • G() – for n nodes, add every edge with probability
Erdős–RényiProperties • Not connected unless • No real clustering • Every vertex has the same expected degree • Doesn’t really have any underlying structure Not a good model of a social network
Watts-Strogatz Model • Parameters: • Construct a ring with vertices. Connect each to their nearest neighbors. • Rewire each edge with probability
Watts-Strogatz Properties • Has local and long-range edges • Path lengths • approach • Clustering Coefficient • starts at ¾, decreases to • Degree distribution • same as G(n,p) Key feature of the model is rewiring allows ‘weak ties’
What can we say about when short paths can be found with local information?
Kleinberg’s Small World Networks • How does the network structure affect being able to locally find short paths? • Start with a grid. • Add edge with probability • As changes, what happens?
Decentralized Routing • is given a message to send to • knows where is on the grid • Try to get the message to as fast as possible • can only see its own links Without the random edges, any message can be routed in time.
Kleinberg’s Results • All long range links equally likely • Short paths exist (whp) • They can’t be found with local information Thm: When , the expected delivery time of any decentralized algorithm is at least
Algorithmically • Decentralized routing delivers messages in an expected steps • All others requires time Why ?
Geometry of the Network v x x u The expected length of x is based on r