1 / 32

Expertise Finding for Question Answering (QA) Services

Expertise Finding for Question Answering (QA) Services. October 16, 2014 Department of Knowledge Service Engineering Prof. Jae-Gil Lee. Brief Bio. Currently, an associate professor at Department of Knowledge Service Engineering, KAIST Homepage: http://dm.kaist.ac.kr/jaegil

ken
Download Presentation

Expertise Finding for Question Answering (QA) Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Expertise Finding for Question Answering (QA) Services October 16, 2014 Department of Knowledge Service Engineering Prof. Jae-Gil Lee

  2. Brief Bio • Currently, an associate professor at Department of Knowledge Service Engineering, KAIST • Homepage: http://dm.kaist.ac.kr/jaegil • Lab homepage: http://dm.kaist.ac.kr/ • Previously, worked at IBM Almaden Research Center and University of Illinois at Urbana-Champaign • Areas of Interest: Data Mining and Big Data

  3. Table of Contents • Community-based Question Answering (CQA) Services • Background and Motivation • Methodology Overview • Evaluation Results • Social Search Engines for Location-Based Questions • Background and Motivation • System Architecture and User Interface • Evaluation Results

  4. Question Answering (QA) Services QA services are good at Recently updated information Personalized information Advice & opinion [Budalakoti et al., 2010] Experts KnowledgeBase Questions Answers Search

  5. Community-based Question Answering (CQA) Services Naver Knowledge-In Yahoo! Answers 50,000 questions per day 160,000 questions per day

  6. Motivation of Our Study • Recently-joined users are prone to leave CQA services very soon • Most contributions (i.e., answers) in CQA services are made by a small number of heavy users Only 8.4% of answerers remained after a year Making the long tail stay longer before they leave is of prime importance towards the success of the services

  7. Problem Setting • To whom does the service provider need to pay special attention?  Recently-joined (i.e., light) users who are likely to become contributive (i.e., heavy) users • Goal: estimating the likelihoodof a light user becoming a heavy user (mainly by his/her expertise) • Challenges: lack of information about the light user 어장관리?

  8. Intuition behind Our Methodology • A person’s active vocabulary reveals his/her knowledge • Vocabulary has sharable characteristics so that domain-specific words are repeatedly used by expert answerers Q&A 2 by Answerer 2 Q&A 1 by Answerer 1 NAND Domain-Specific Vocabularies SSD RAM ECC SSD NAND Sharable Characteristics Level Difference RAM ECC Computer Operation Device Drive Common Vocabularies Memory Data

  9. Estimated Expertise The more expert a user is, the higher the level of words he/she used is. Heavy Users Words Light Users

  10. Availability • Simply measuring the number of a user’s answers with their importance proportional to their recency

  11. Answer Affordance • Being defined as the likelihood of a light user becoming a heavy user if he/she is treated specially • Considering both expertiseand availability

  12. Data Set • Collected from Naver Knowledge-In (KiN, 지식인) • Spanning ten years (from Sept. 2002 to Aug. 2012) • Including two categories: Computers and Travel • Computers: factual information, Travel: subjective opinions • The entropy was used for measuring the expertise of a user, working well especially for the categories where factual expertise is primarily sought after [Adamic et al., 2008] • Statistics

  13. Evaluation Setting (1/2) • Finding the top-k users by Affordance()for light users our methodology • Retrieving the top-k directoryexperts managed by KiN competitor • Measuring the two measuresfor the next one month • User availability: the ratio of the number of the top-k users who appeared on the day to the total number of users who appeared on that day • Answer possession: the ratio of the number of the answers posted by the top-k users on the day to the total number of answers posted on that day

  14. Evaluation Setting (2/2) Picked up the top-k directory experts managed by KiN Sept. 2002 July 2011 July 2012 Aug. 2012 Ten year period Used for deriving the word levels Used for finding top-k experts by our methodology Monitored the user availability and answer possession

  15. top-400 top-200 (a) Computers (b) Travel The result of the user availability top-400 top-200 (a) Computers (b) Travel The result of the answer possession

  16. See the paper for the technical details. Sung, J., Lee, J., and Lee, U., "Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services," In Proc. 7th Int'l AAAI Conf. on Weblogs and Social Media (ICWSM), Cambridge, Massachusetts, July 2013. This paper received the Best Paper Award at AAAI ICWSM-13.

  17. Table of Contents • Community-based Question Answering (CQA) Services • Background and Motivation • Methodology Overview • Evaluation Results • Social Search Engines for Location-Based Questions • Background and Motivation • System Architecture and User Interface • Evaluation Results

  18. Social Search (1/2) • A new paradigm of knowledge acquisition that relies on the people of a questioner’s social network

  19. Social Search (2/2) If you want to get some opinions or advices from your online friends, what do you do? Not knowing whom to ask Knowing whom to ask Taking advantage of both approaches Social Search

  20. KiN Here (지식인 위치질문) • A query is routed by finding a match between a target location of a query and a relevant location of a user 동 단위로 추가

  21. Location-Based Questions • Informally defined as “search for a business or place of interest that is tied to a specific geographical location”[Amin et al., 2009] • Very popular especially in mobile search and typically subjective • Mobile search is estimated to comprise 10%∼30% of all searches • About 9∼10% of the queries from Yahoo! mobile search, over 15% of 1 million Google queries from PDA devices, and about 10% of 10 million Bing mobile querieswere identified as location-based questions • In a set of location-based questions, 63% of them were non-factual, and the remaining 37% of them were factual  Mobile social search is the best way to process location-based questions

  22. Glaucus: A Social Search Engine for Location-Based Questions 1. Asking a question to Glaucus2. Selecting proper experts3. Routing the question to the experts4. Returning an answer to the questioner5. (Optional) Rating the answer 2: Selected Experts 1: Query 3: Query GlaucusSocial Search Engine 4: Answer Answer 5: Feedback Questioner User Database Crawling Users

  23. User Interface • An Android app has been developed and is under (closed) beta testing Questioner Answerer

  24. Data Collection • Being able to collect who visited whereand whenon geosocial networking services such as Foursquare • Users check-in to a venue and also may leave a tip • Our crawler collects such information upon user approval

  25. Expert Finding Location Aspect Model Venue Venue Other Users Location Location Top-k SimilarityCalculation Score Category Category Question Time Time Questioner Score Misc. Misc. Online Friend? Score Score

  26. Evaluation Setting • Collected check-in’s and tips from Foursquare (foursquare.com) • Confined to the places in the Gangnam District • Ranging from April 2012 to December 2012 • Statistics

  27. Evaluation Results Qualification of the Experts: Two human judges investigated the profiles of the experts selected by the three systems for 30 questions (distributed to 3 sets) and gave a score in 3 scales. Aardvark SocialTelescope Glaucus 8.82 8.25 7.78 6.68 6.61 6.31 DCG 4.07 3.99 3.94 Set 1 Set 2 Set 3 Quality of the Answers: Two human judges examined the quality of the answers―both from experts and non-experts―and gave a score in 3 scales. 2.37 1.97

  28. Mobile User Availability • Motivation • Study Methodology Availability Classifier Decision Tree, SVM, Random Forest … Context 26 Features 26 Features Smart Phone Log Prediction Training External Information(Time, Date) Availability? Availability Class Label Classification Model

  29. User Behavior Collection

  30. Preliminary Evaluation Results • Accuracy • 10-fold cross validation • 10 users for 5 weeks • Important Features • 1st: Time, Day of Week • 2nd: Running Apps • 3rd: WIFI SSID, # of Apps (30 mins), Time of Day

  31. See the paper for the technical details. Choy, M., Lee, J., Gweon, G., and Kim, D., "Glaucus: Exploiting the Wisdom of Crowds for Location-Based Queries in Mobile Environments," In Proc. 8th Int'l AAAI Conf. on Weblogs and Social Media (ICWSM), Ann Arbor, Michigan, June 2014.

  32. Thank you very much!Any Questions? E-mail: jaegil@kaist.ac.kr Homepage: http://dm.kaist.ac.kr/

More Related