1 / 20

Social Networking Algorithms

Social Networking Algorithms. related sections to read in Networked Life : 2.1,2.3 3.1 4.1 5.1 6.1-6.2 8.1 9.1. Google Search. PageRank algorithm crawling (follow hyperlinks embedded in HTML) >50 billion pages indexed (2012) (not counting intranets)

Download Presentation

Social Networking Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Social Networking Algorithms related sections to read in Networked Life: 2.1,2.3 3.1 4.1 5.1 6.1-6.2 8.1 9.1

  2. Google Search • PageRank algorithm • crawling (follow hyperlinks embedded in HTML) >50 billion pages indexed (2012) (not counting intranets) source: http://www.statisticbrain.com/total-number-of-pages-indexed-by-google/ • indexing • assessing relevance: • number times keyword mentioned • proximity/order • title/heading, bold/fontsize • what makes a page “authoritative”? • users only look at top 3-10 hits, so what gets ranked at the top is crucial

  3. Inverted Index • document retrieval • intersection of search terms • what about spelling errors, stemming, synonyms, semantic relationships? • more complex Boolean queries (or, not) • computation distributed over many computers using MapReduce • programming functions to distribute tasks and assemble results Document Collection (web pages): doc[0] = “all about the banana slug" doc[1] = “nutritional content of bananas" doc[2] = "bananas of the world“ doc[3] = “nutrition for athletes” query: “banana nutrition” {0,1,2}∩{1,3}={1}

  4. the web-graph G=(V,E) • hyperlinks = directed edges • strongly connected components • adjacency matrix (sparse) • which pages are important? • number of connections (degree, centrality)? • number of in-edges (mentions/references)? Texas A&M Bowling League Members ... Joe ... www.tamu.edu Joe Student’s Home page. I am a student at Texas A&M I write code in Java Java java.sun.com

  5. xi xj PageRank • need trust/reputation models? • “importance” of a node xi is based on: • importance neighbors who link to you (xJ) • weights 1/djdistribute a node’s importance over the nodes it links to • modify the equations to handle unlinked pages

  6. system of coupled equations • iterative solutions • algorithms that start with random importances and adjust them until all the xi’s are mutually consistent (convergence) • in matrix form, this becomes an eigenvalue problem (hard to calculate) • x is a vector of importances • H is the weighted adjacency matrix x1=0.128 x2=0.159 x3=0.202 x4=0.150 x5=0.106 x6=0.044 x7=0.060 x8=0.145 x = Hx

  7. The Network Effect • Metcalfe's law - the value of a telecommunications network is proportional to the square of the number of connected users of the system (n2) • going viral (videos and memes) • if you tell two friends, and they each tell 2 friends...it exponentially scales up to thousands of people in just a few steps • Small Worlds phenomenon • social networks not same as physical network • also scale-free topology (Power Law) • 6 degrees-of-separation (Milgram); community structure • crowd-sourcing – is there value in the aggregate opinion? • combines multiple experts (as well as boneheads and malefactors) • filters out bias of a few extreme opinions (since you don’t know who to trust)

  8. Recommender Systems • Netflix, Pandora • how can we benefit from evaluations of others? • long-tail distribution for media • there are MANY movies, songs, etc. • most are rarely listened to • yet each individual has eclectic tastes • if a person likes X and Y, how to predict other Z? • similarity (collaborative filtering) • not just intersection of common features of X and Y • exploit what other people with similar tastes like • each user makes sparse recommendations • merge, and extract correlations; latent factors?

  9. Machine Learning • other people who have watched movies with Ron Perlman tend to also like... • given a set of recommendations of users u for movies i: {(u,i)} or {rui}, build a predictive model • accuracy: • Netflix Prize • around 100 million anonymous ratings released as training set (1995-2001), 480k users, 17k movies • 2009: the grand prize of US$1,000,000 was given to the BellKor's Pragmatic Chaos team which bested Netflix's own algorithm for predicting ratings by 10.06%.

  10. Aggregating Ratings • reviews on Amazon, TripAdvisor, Rotten Tomatoes (movies)... • trust, reputation, shills • weight each reviewer by consistency? • wisdom of the crowd • Galton’s experiment (1906), guessing the weight of an ox • subjectivity of hotel recommendations • can you trust the average weighting? • also depends on number of reviews, and dispersion (do # of 1’s matter?)

  11. Auctions • examples: • Ebay • Google ad space (companies bid on search terms, position on page) • broadcasting spectrum (airwaves, FCC) • efficient, decentralized mechanism for resource allocation among many parties (exploit market forces) • goals: • maximize value for auctioneer • minimize cost for buyers; make bidding simple, not strategic • fairness, free of manipulation • utility functions (values to self-interested agents)

  12. Auctions • types of auction mechanisms • public (open-outcry) vs. sealed-bid • ascending vs. descending • first-price vs. second-price • Vickrey (second-price, sealed-bid) auction • no incentive to under- or over-bid • no winner’s remorse • can show this is a Nash equilibrium strategy • current research: combinatorial auctions • bids for multiple items coupled together • algorithms for winner determination? (NP-hard)

  13. Electronic Voting • Rank Aggregation • a social choice mechanism • unlike the US system, imagine you can vote for N candidates by ranking them in order of preference • other applications: vote for Olympics venues or baseball all-stars out a defined list of possibilities

  14. Another example: Meta-search • merging search-engine results • Cynthia Dwork (WWW, 2001) • by merging top hits from google, bing, yahoo, altaVista, etc., could you get a better combined list? • search results are usually sparse – a given page might not be on every list of results • how should you rank page ranked 2nd, 3rd, and 101st? • what if one of the engines is paid to rank certain sites highly? (web-search “spam”)

  15. among the many possible orderings (A<B<C, B<A<C...) is there a final ranking that is “most similar” to the most voters (representative)? • the Borda count • add up the voted ranks as weights • pros: sample, anonymous, neutral, consistent • cons: can be influenced by extreme votes that drag good candidates down

  16. Condorcetalternative: the candidate that beats all others in pairwise comparisons • in this example, candidate Q wins based on Borda count, even though the majority of voters preferred P over Q

  17. Condorcetalternative: the candidate that beats all others in pairwise comparisons • in this example, candidate Q wins based on Borda count, even though the majority of voters preferred P over Q P vs. Q: 2/3 prefer P P vs. R: 2/3 prefer P P vs. S: 2/3 prefer P Q vs. R: 3/3 prefer Q Q vs. S: 3/3 prefer P R vs. S: 3/3 prefer P P R Q S

  18. Condorcetalternative: the candidate that beats all others in pairwise comparisons • in this example, candidate Q wins based on Borda count, even though the majority of voters preferred P over Q • generalization: Condorcet criterion • for each pair of candidates A and B, A must be ranked over B if the majority prefer A over B • Dwork showed there is a polynomial-time algorithm based on computing “locally Kemeney-optimal” rankings

  19. ballot: a% new stadium b% new library c% fix roads d% hire new police 100%=1 vote Electronic Voting • complex (weighted) votes of preferences for multiple outcomes • example voting on funding of public projects to maximize public welfare • avoid the “free-rider” syndrome • “VCG” mechanism: penalize the winner by charging a tax based on how much he influenced result over alternative outcomes • encourages voters to vote their true beliefs

  20. Summary • The value of networks grows more than linearly (quadratically?) with the number of people participating. • Algorithms like PageRank can identify “important” nodes in networks by analyzing connectivity (small-worlds topology). • There is “wisdom” in crowds. • Algorithms can aggregate preferences or rankings or ratings over multiple users to allow robust methods for determining combined/community opinion.

More Related