220 likes | 341 Views
Question-Answering on Yahoo!Answers: Preliminary Results. Rong Tang Sheila Denn OCLC/ALISE LIS Research Grant Presentation ALISE 2009 January 23, 2009. Background. Yahoo!Answers Social Q&A 25+ pre-defined categories Users post questions, answer questions, rate answers, provide comments
E N D
Question-Answering on Yahoo!Answers: Preliminary Results Rong Tang Sheila Denn OCLC/ALISE LIS Research Grant Presentation ALISE 2009 January 23, 2009
Background • Yahoo!Answers • Social Q&A • 25+ pre-defined categories • Users post questions, answer questions, rate answers, provide comments • One best answer chosen by the asker or through vote • Users may provide comments
Our Research Project • Funded by OCLC/ALISE Grant Program and Simmons College President’s Fund for Research • Project Staff: • Rong Tang (PI) • Sheila Denn (Co-PI) • Sam Kalat (technology consultant, programmer) • Laura Saunders (Research Assistant) • The project wiki page documents the relevant literature and project progression, with extensive meeting notes on coding decisions
Research Questions • Are existing question taxonomies (such as those in Graesser et al. (1994) and Freed (1994)) valid in a social Q&A environment? • What are the relationships between the linguistic characteristics, functional properties, and subject content of the questions and the kinds of responses that they receive? • What are the characteristics of answers that are chosen as “best” answers? • What is the role of the social function vs. the information function in social Q&A? • What are the implications of the above for provision of library and information services?
Previous Research • Question classification • Wh- questions (Robinson & Rackstraw, 1972) • Conceptual question categories (Lehnert, 1978) • Content-based question categories (Graesser, et al., 1994) • Reference question classification (Pomerantz, 2005) • Questions in Dynamic Semantics (Aloni, Butler, & Dekker, 2007) • Answer classification • Much less research here than with question classification • Answer selection rules (Lehnert, 1978) • Criteria based on Yahoo!Answers comments (Kim et al., 2007)
Previous Research (cont.) • Formal studies of Online Q&A • Answerers: “specialists” vs. “synthesists” (Gazan, 2006) • Questioners: “seekers” vs. “sloths” (Gazan, 2007) • Question purpose (Graesser, et al., 1994) • Filling knowledge gaps • Establishing and monitoring common ground • Coordinating social action • Directing the conversation and controlling attention
Research Plan • Data collection and sampling • Gathered a stratified random sample of 3,000 question-answer sets, including any comments • Stratified by 25 top-level categories assigned by Yahoo!Answers • Data coding • Content analysis at multiple levels • Syntactic • Semantic • Pragmatic
Research Plan (cont.) • Data Analysis • Descriptive statistics will be produced for: • Frequency of answers provided per question • Average length of time to first answer • Distribution of subject categories • Distribution of question and answer types • Distribution of chosen answer types • Correlation analysis will be performed for: • Linguistic characteristics of questions and answers • Functional categories of questions and answers • Subject categories of questions and answers
Progress to Date • Sample has been collected • Preliminary coding has begun • Syntactic coding of questions is complete • Wh- questions • Inversion questions • Other questions • Multiparts • Double coding • Syntactic coding of question descriptions is complete • Number of questions included in description text • Type of questions
Data Coding • Two coders perform coding individually then go over the coding to reach consensus on final coding of each question • Use of informal language presents a challenge for coding • Is it a question if it doesn’t include a question mark? Is it a question simply because it has a question mark in the end? • Should “WTF” be coded a “what” question or other question? Or not at all? • Coding multiparts of a question, eg., “Why do husbands feel they have to lie to other women about being married, and when the other woman finds out?” • Double coding questions such as "Is there anywhere you can listen to citizen band radio online?"
Wh-question frequency • “What” Questions
Wh-question frequency • “Why” Questions
Wh-question frequency • “How” Questions
Wh-question frequency • “Inversion” Questions
Next Steps • Start semantic and pragmatic analysis of questions • Start answer analysis • Start comment coding • Explore the association and features of Q and A and C • Develop a conceptual and analytical model for social Q&A