300 likes | 500 Views
Social Search . Laks V.S. Lakshmanan. How are web search queries distributed? . Taken from Damon Horowitz’s talk slides. . How are web search queries distributed? . Web search works well! . Web search is a good start; more effort needed, possibly on top. . Based on opinion of friends. .
E N D
Social Search Laks V.S. Lakshmanan
How are web search queries distributed? Taken from Damon Horowitz’s talk slides.
How are web search queries distributed? Web search works well! Web search is a good start; more effort needed, possibly on top. Based on opinion of friends. Adapted from Damon Horowitz’s talk slides.
Social Search General Remarks • Search in text corpora (IR). • Search in a linked environment (authority, hub, pagerank). • What if my social context should impact search results? • E.g.: users in a SN post reviews/ratings on items. • “Items” = anything they want to talk about and share their opinions with their friends and implicitly recommend (against).
Social Search – The Problem & Issues • Search results for a user should be influenced by how he/his friends rated the items, in addition to quality of match as determined by IR methods and/or by pagerank-like methods. • Transitive friends’ ratings may matter too, up to some distance. • Users may just comment on an item w/o explicitly rating it.
More Issues • Factoring in transitive friends somewhat similar to Katz: longer the geodesic from u to v, the less important v’s rating is to u. • Trust may be a factor. • There is vast literature on trust computation. • May need to analyze opinions (text) and translate into strength (score) and polarity (good or bad?).
Other approaches • Google Social Search • Search for “barcelona” returns results including searcher’s friends’ blogs. • Relevant users need to connect their facebook, twitter, ... Accounts to their google profile. • Of particular value when serahcing for local resources such as shows and restaurants. • But this does not use user-generated content for ranking. • Aardvark – part of Google labs that was shut down -- is an interesting approach to social search. • There are other companies such as sproose.com. See wikipedia for a list and check them out. (sproose seems to take reviews into account in ranking.) [defunct now?] • Some papers develop notions of SocialRank, UserRank, FolkRank, similar to PageRank (see references in Schenkel et al. 2008 [details later]). • Part I based on: Damon Horowitz and SepandarD. Kamvar. The Anatomy of a Large-Scale Social Search Engine. WWW 2010.
The Aardvark Approach • Classic web search – roots in IR; authority centric: return most relevant docs as answers to a search query. • Alternative paradigm: consult village wise people. • Web search – keyword based. • Social search – natural language; social intimacy/trust instead of authority. • E.g.: what’s a good bakery in the Mag mile area in Chicago? • What’s a good handyman, who is not too expensive, is punctual, and honest? Can you think of similar systems that already exist? Hint: what do you do when you encounter diffculties with a new computer, system, software, tool? Note: long, subjective, and contextualized queries. These queries are normally handled offline, by asking real people. Social search seeks to make them online.
Aardvark Modules • Crawler and Indexer. • Query Analyzer. • Ranking Function. • UI.
Index what? • User’s existing social habitat – LI, FB contacts; common groups such as school attended, employer, …; can invite additional contacts. • Topics/areas of expertise: learned from • Self declaration • Peer endorsement (a la LI) • Activities on LI, FB, Twitter, etc • Activites (asking/answering [or not] questions) on Aardvark. • Forward Index: user (id), topics of expertise sorted by strength, answer quality, response time, … • Inverted Index: for each topic, list of users sorted on expertise, plus answer quality, response time, etc.
Query Life Cycle Transport Layer Conversation Manager Routing Engine
Query Answering Model Prob. that u_i is an expert in topic t. Prob. that question q is In topic t. All this is fine. But it’s important to Engage a large #high quality question askers and answerers to make and keep The system useful. Prob. That u_i can successfully answer a question from u_j. Usually based on strength of social connections/trust etc. Prob. that u_i can successfully answer question q from u_j. Red and Cyan can be computed offline and updated periodically. Purple computed online using soft classification. Computation of is parallelizable.
Indexing Users • For each user and topic learn from -- Positive Signals: • Self declaration • Peer endorsement • Online profiles – e.g., FB, home pages etc. (linear SVM is used.) • Parse online activities (FB, LI, Twitter, etc.) • Negative Signals: • Muting a topic. • Declining to answer question on a topic. • Getting negative f/b on an answer from other users. • Topic Strengthening: • If your expertise in a topic is non-zero, add up expertise of your neighbors and renormalize. • Normalize probabilities across topics, for a user. • Finally, Pr • Connection strength: cosine similarity over feature space – e.g., social distance, demographics, vocabulary similarity, response time similarity, etc. But (artificially) forced to probability via normalization: • As users interact, update these two probabilities.
Question Analysis • Semi-automated: • Soft classification into topics – • Filter out non-qns, inappropriate and trivial qns. • KeywordMatchTopicMapper map keywords/terms in question to topics in user profile. • TaxonomyTopicMapper places question on a taxonomy covering popular topics. • LocationMatching. • Human judges assign scores to topics (evaluation).
Overall ranking • Aggregation of three kinds of scores: • Topic expertise. • Social proximity/match between asker and answerer. • Availability of answerer (can be learned from online activity patterns, load, etc.) • Answerers contacted in priority order. • Variety of devices supported. • See paper for more details and for experimental results.
SocialWisdom for Search and RecommendationRalf Schenkel et al. IEEE DE Bullet. June 2008. • Expand scope of RecSys by storing (in a relational DB) other info.: Users(username, location, gender, . . .) Friendships(user1, user2, ftype, fstrength) Documents(docid, description, . . .) Linkage(doc1, doc2, ltype, lweight) Tagging(user, doc, tag, tweight) Ontology(tag1, tag2, otype, oweight) Rating(user, doc, assessment) Just modeling/scoring aspects; scalability ignored for now.
Friendship types and search modes • Social – computed from explicit social graph, say using inverse distance. Could be based on others like Katz. • Spiritual – derived based on overlap in activities (rating, reviews, tagging, ...). • Global – all users given equal weight = 1/|U|. • All measures normalized so the weights on all o/g edges from a user sum to 1. • Combos possible: F(u,u’) = aFso(u,u’) + bFsp(u,u’) + cFgl(u,u’), with a+b+c = 1.
Scoring documents for tags – digress into BM25 • BM25 – state of the art IR model. idf(ti) (k1+1)tf(D, ti) • score(D,ti) = -------------------------- tf(D, ti) + k1(1-b+b.len(D)/avgdl) • k1, b tunable parameters. • #docs – n(ti)+0.5 • idf(D, ti) = log ------------------- • n(ti)+0.5 • tf = term frequency, idf = inverse doc frequency.; avgdl = avg doc length, n(ti) = #docs containing ti.
Adapt to social search (k1 + 1) · |U| · sfu(d, t) • su(d, t) = ---------------------------- · idf(t) k1 + |U| · sfu(d, t) |U|=#users. |D| − df(t) + 0.5 • idf(t) = log --------------------- df(t) + 0.5 |D|=#docs, df(t) = #docs tagged t. • sfu(d, t) = ∑vЄUFu(v) tfv(D,t). • BTW, when we say docs, think items!
Tag expansion • Sometimes (often?) users may use related tags: e.g., tag an automobile as “Ferrari” and as “car”. • tsim(t,t’) = P[t|t’] = df(t&t’)/df(t’). //error in the paper.// • Then sfu*(d, t) = maxt‘ЄT tsim(t,t’) . sfu(d, t‘). Plug in sfu*(d,t) in place of sfu(d,t) and we are all set.
Socially aware Tag Expansion • Who tagged the documents and what is the strength of their connection to u? • tsimu(t,t’) = ∑vЄUFu(v).dfv(t&t’)/dfv(t’). • Score for a query: • s*u(d, t1, ..., tn) = ∑ti s*u(d,ti). • Experiments – see paper: librarything.com, mixed results. • Measured improvement in precision@top-10 and NDCG@top-10.
Lessons and open challenges • Socializing search across the board is a bad idea. • Need to understand which kind of queries can benefit from what kind of settings (a, b, c values). Examples below. 1. Queries w/ global information need: perform best when a= b= 0; e.g., “Houdini”, “search engines”, “English grammar”; fairly precise queries; reasonably clear what are quality results.
Lessons & Challenges (contd.) • 2. Queries with a subjective taste (a social aspect): perform best when a≈1; e.g., “wizard”; produces a large number of results but user may like only particular types of novels such as “Lord of the Rings”; the tag “wizard” may be globally infrequent but frequent among user’s friends. • 3. Queries with a spiritual information need: perform best when b ≈ 1; e.g., “Asia travel guide”; very general, need to make full use of users similar (in taste) to searcher. (Think recommendations.)
Lessons & Challenges (contd.) • 4. Queries with a mixed information need: perform best when a≈b≈0.5; e.g.,“mystery magic”. • Challenges: The above is an ad hoc classification. Need more thorough studies and deeper insights. • Can the system “learn” the correct setting (a,b,c values) for a user or for a group? • The usual scalability challenges: see following references. • Project opportunity here.
Follow-up Reading (Efficiency) • S. Amer-Yahia, M. Benedikt, P. Bohannon. Challenges in Searching Online Communities. IEEE Data Eng. Bull. 30(2), 2007. • R. Schenkel, T. Crecelius, M. Kacimi, S. Michel, T. Neumann, J.X. Parreira, G. Weikum. Efficient Top-k Querying over Social-Tagging Networks. SIGIR 2008. • M.V. Vieira, B.M. Fonseca, R. Damazio, P.B. Golgher, D. de Castro Reis, B. Ribeiro-Neto. Efficient Search Ranking in Social Networks. CIKM 2007.
Follow-up Reading (Temporal Evolution, Events, Networks, ...) • N. Bansal, N. Koudas. Searching the Blogosphere. WebDB 2007. • M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, A. Tomkins. Visualizing Tags over Time. ACM Transactions on the Web, 1(2), 2007. • S Bao, G Xue, X Wu, Y Yu, B Fei. Optimizing web search using social annotations. WWW 2007. • Anish Das Sarma, Alpa Jain, and Cong Yu. Dynamic Relationship and Event Discovery. In WSDM, Hong Kong, China 2011. • SihemAmer-Yahia, Michael Benedikt, LaksLakshmanan, Julia Stoyanovich. Efficient Network-aware Search in Collaborative Tagging Sites VLDB 2008, 2008 We will revisit social search later in your talks.