Jae-Gil Lee Department of Knowledge Service Engineering KAIST

Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services Jae-Gil Lee Department of Knowledge Service Engineering KAIST

Contents • Background and Motivation • Overview of the Methodology • Detailed Methodology • Experiment Evaluation • Conclusions This paper received the Best Paper Award at AAAI ICWSM-13

Community-Based Question Answering (CQA) Services CQA services Ask 50,000 questions per day 160,000 questions per day Answer • Search engines are weak at • Recent updated information • Personalized information • Advice & opinion • [Budalakoti et al. 2010] Current problems in CQA services Too many questions Hard to find questions to answer  Solutions: expert finding, question routing[Zhou et al. 2009]

Question Routing Graph-based Profile-based Content-based HITs, PageRank Find influential answerers Language Modeling Match questions & answerers User profile Find experts based on profiles Also, hybrid methods • Two important factors in question routing • Expertise: answerers need proper knowledge on the question area • Availability: answerers need time to answer • [Horowitz et al. 2010, Li et al. 2010, Zhang et al. 2007] There is a trade-off between expertiseand availability

Short Tail vs. Long Tail • Most contributions (i.e., answers) in CQA services are made by a small number of heavy users • Many questions won’t be answered if such heavy users become unavailable  A system is not robust if it heavily relies on a small number of users

On the other hand, recently-joined users are prone to leave CQA services • Example: the appearances of the 9,874 answerers who wrote answers in the computers category of KiN Only 8.4% of answerers remained after a year

Comparison with Traditional Question Routing • Motivating such recently-joined users to become heavy users―by routing proper questions to them so that they can easily contribute―is of prime importance towards the success of the services Existing methodologies  C Our methodology  D Which users should we take care of? Recently-joined expert users!

Problem Setting • Developing a methodology of measuring the likelihood of a light user becoming a contributive (i.e., heavy) user in the future in CQA services • Input: (i) the statistics of each heavy user, (ii) the answers written by heavy users, (iii) the answers written by light users • Output: the likelihood of each light user becoming a heavy user in the future  Answer Affordance

Contents • Background and Motivation • Overview of the Methodology • Detailed Methodology • Experiment Evaluation • Conclusions

Challenges • There is no sufficient information (i.e., answers) to judge the expertise of recently-joined users!  Kind of a cold-start problem • How can we cope with the lack of information?

Intuition • A person’s active vocabulary reveals his/her knowledge • Vocabulary has sharable characteristics so that domain-specific words are repeatedly used by expert answerers  Using the active vocabulary of a user to infer his/her expertise, i.e., using the vocabulary to bridge a gap between heavy users and light users

Details Vocabulary Level • Vocabulary knowledge • “Vocabulary knowledge should at least comprise two dimensions, which are vocabulary breadth (or size), and depth (or quality)” [Marjorie et al. 1996] • Three dimensions of lexical competence • “(a) partial to precise knowledge, (b) depth of knowledge, and (c) receptive to productive use ability” [Henriksen 1999] • Productive vocabulary ability • “It implies degrees of knowledge. A learner may be reluctant to use infrequent word using a simpler, more frequent word of a similar meaning. Such reluctance is often a result of uncertainty about the word’s usage. Lack of confidence is a reflection of imperfect knowledge. We refer to the ability to use a word at one’s free will as free productive ability” [Laufer et al. 1999]

Details Domain Experts’ Vocabulary Usage • “Experts generated queries containing words from domain-specific lexicons fifty percent more often than non-experts. In addition to being able to generate more technically-sophisticated queries, experts also generated longer queries in terms of tokens and characters. It may be that because domain experts are more familiar with the domain vocabulary.” [White et al. 2009] • “Behavior of software engineers is quite distinct from general web search behavior. They use longer and more detailed queries. They make heavy use of specialized terms and search syntax. … Controlled vocabulary look-up lists or query processing tools should be in place to deal with acronyms, product names, and other technical terms” [Freund et al. 2006] • “When searching, experts found slightly more relevant documents. Experts issued more queries per task and longer queries, and their vocabulary overlapped somewhat more with thesaurus entries” [Zhang et al. 2005]  Domain experts use specialized, but formatted/standardized words

Details Domain Expert’sVocabulary Durability • “One important change in behavior was the use of a more specific vocabulary as students learned more about their research topic” [Vakkari et al. 2003] • “Experts’ use of domain-specific vocabulary changes only slightly over the duration of the study. However, many non-expert users exhibit an increase in their usage of domain-specific vocabulary” [White et al. 2009] Domain expert’s unique word set remains for a long time without change

Usage of the Vocabulary: Overview Heavy Users Words Light Users

Basics of CQA Services • Top-Level Categories (e.g., Computers, Travel) • Defining the expertise of a user on a top-level category in our methodology • User Profile • Selection Count = A • Selection Ratio = B = A/D • Recommendation Count = C

Answer Affordance • Considering both expertiseand availability

Estimated Expertise Step 1 Step 2 Step 3 Step 4 Expertise(u1) Estimated Expertise(un+1) WordLevel(w1) WordLevel(w2) WordLevel(w3) WordLevel(wn) u1 w1,w2, w4, w6 … w1,w3, w4, w5… un+1 Vocabulary Expertise(u2) Estimated Expertise(un+2) u2 w2,w4, w6, w7… w3,w4, w5, w8 … un+2 . . . . . . Expertise(un) Estimated Expertise(un+k) . . . w1,w3, w6, w8… un w2,w3, w6, w8… un+k WordLevel(wi) Heavy Users UH Light Users UL

Step 1: the expert score of a heavy user is calculated using the abundant historical data Expertise(uh) • The expertise of a user becomes higher (i) as the user’s answers are more concentrated on the target category and (ii) as the user has higher selection count, selection ratio, and recommendation count

Step 2: the level of a word is determined by the expert scores of the heavy users who used the word before WordLevel(wi) • The word level of a word becomes higher as the word is used by more expert users and more frequently • Decomposing an answer into words is reliable even for a small number of answers, because each answer typically has quite a few words

Step 3: these word levels are propagated to a set of words used by a light user in his/her answers • This step is supported by the observation that the vocabulary of an expert stays mostly unchanged despite a temporal gap [White, Dumais, and Teevan2009]

Example: sample words in the travel category with their value of WordLevel(Wi)

Step 4: the expert score of the light user is reversely calculated based on his/her vocabulary EstimatedExpertise(ul)

Availability • Simply measuring the number of a user’s answers with their importance proportional to their recency

Data Set • Collected from Naver Knowledge-In (KiN) • http://kin.naver.com • Ranging from September 2002 to August 2012 Ten years • Including two categories: Computers and Travel • Computers  factual information, Travel  subjective opinions • The entropy is used for measuring the expertise of a user, working well especially for the categories where factual expertise is primarily sought after [Adamic et al. 2008] • Statistics

Period Division • Dividing the 10 year period into three periods • The resource period is sufficiently long to learn the expertise of users, so is the test period; in contrast, the training period is not • Heavy users: those who joined during the resource period • Light users: those who joined during the training period (only one year) • Assuming that the end of the training period is the present

Accuracy of Expertise Prediction: Preliminary Tests • Extracting the main interest declared by each user in CQA services • Measuring the ratio of such self-declared experts on the target category among the top-k light users sorted by EstimatedExpertise() (a) Computers (b) Travel The ratio of users who expressed their interests

Accuracy of Expertise Prediction: Evaluation Method • Finding the top-k users by EstimatedExpertise() from the training period our prediction • Finding the top-k users by KiN’s ranking scheme from the test period ground truth • KiN’s ranking scheme is a weighted sum of the selection count and the selection ratio • Measuring (i) P@k and (ii) R-precision • Repeating the same procedure for comparison with the following approaches • Expertise(): the way of ranking heavy users rather than light users in our methodology • SelCount(): the selection count • RecommCount(): the recommendation count

Accuracy of Expertise Prediction: Results The precision performance for the computers category The precision performance for the travel category

Accuracy of Answer Affordance: Evaluation Method • Finding the top-k users by Affordance() for light users  our methodology • Finding the top-k users managed by KiNcompetitor • Measuring the user availability and the answer possession for the next one month • User availability: the ratio of the number of the top-k users who appeared on the day to the total number of users who appeared on that day • Answer possession: the ratio of the number of the answers posted by the top-k users on the day to the total number of answers posted on that day

(a) Computers (b) Travel The result of the user availability (a) Computers (b) Travel The result of the answer possession

Conclusions • Developed a new methodology that can make CQA services more active and robust • Verified the effectiveness of our methodology using a real data set for ten years Quote from the reviews: “I'm sold. If these results hold on another CQA site, this will be a very significant contribution to online communities. The study is well done, it's incredibly readable and clear, and the evaluation dataset is impeccable (10 years of data from one of the top 3 sites).”

Thank You!

Jae-Gil Lee Department of Knowledge Service Engineering KAIST