90 likes | 181 Views
Social Networking Algorithms. related sections to read in Networked Life : 2.1,2.3 3.1 4.1 5.1 6.1-6.2 8.1 9.1. The Network Effect. Metcalfe's law - the value of a telecommunications network is proportional to the square of the number of connected users of the system ( n 2 )
E N D
Social Networking Algorithms related sections to read in Networked Life: 2.1,2.3 3.1 4.1 5.1 6.1-6.2 8.1 9.1
The Network Effect • Metcalfe's law - the value of a telecommunications network is proportional to the square of the number of connected users of the system (n2) • Facebook friends • Twitter followers • collective opinions on news/products/movies... • videos or products or memes going “viral” • if you tell two friends, and they each tell 2 friends...it scales up exponentially to thousands of people in just a few steps
Small Worlds phenomenon • social networks not same as physical network (because your friends can be remote) • also a scale-free topology (power law/Long-tail distribution) • 6 degrees-of-separation (Milgram’s exper.) • community structure
Exploiting the Network Effect • Ebay – price discovery through auctions • Netflix - recommendations based on others’ preferences • Reddit – reputation based on others’ opinions on your posts • Crowd-sourcing • is there value in the aggregate opinion? • examples: ratings on Amazon or TripAdvisor or YouTube • combines multiple experts (as well as non-experts) • filters out bias of a few extreme opinions (since you don’t know who to trust)
Google Search • PageRank algorithm • crawling (follow hyperlinks embedded in HTML) >50 billion pages indexed (2012) (not counting intranets) source: http://www.statisticbrain.com/total-number-of-pages-indexed-by-google/ • indexing • assessing relevance: • number times keyword mentioned • proximity/order • title/heading, bold/fontsize • what makes a page “authoritative”? • users only look at top 3-10 hits, so what gets ranked at the top is crucial
Inverted Index • Basic document retrieval • Build an index of all pages that contain each search term • For multi-word searches, like “functional programming languages”, take intersection of documents with each search term • Does it matter how many times a page mentions a search term? (does this reflect importance? No) • what about dealing with spelling errors, stemming, synonyms, semantic relationships? • more complex Boolean queries (or, not) • How do you do this for 50 billion pages? • Google distributes computation over a cluster of computers using MapReduce • programming functions to distribute tasks and assemble results
Which search hits are most important? • having many Twitter followers does not make you an expert (populartity ≠ expertise) • similarly, lots of hyperlinks to a page does not mean it is authoritative • The web-graph: G=(V,E) • hyperlinks = directed edges • strongly connected components • adjacency matrix (sparse) Texas A&M Bowling League Members ... Joe www.tamu.edu Joe Student’s Home page. I am a student at Texas A&M I write code in Java Java java.sun.com
xi xj PageRank • need trust/reputation models? • “importance” of a node xi is based on: • importance neighbors who link to you (xJ) • weights 1/djdistribute a node’s importance over the nodes it links to • modify the equations to handle unlinked pages
system of coupled equations • iterative solutions • algorithms that start with random importances and adjust them until all the xi’s are mutually consistent (convergence) • in matrix form, this becomes an eigenvalue problem (hard to calculate) • x is a vector of importances • H is the weighted adjacency matrix x1=0.128 x2=0.159 x3=0.202 x4=0.150 x5=0.106 x6=0.044 x7=0.060 x8=0.145 x = Hx