220 likes | 231 Views
This research project aims to analyze the question-answering process on Yahoo! Answers, focusing on the linguistic characteristics, functional properties, and subject content of questions and their corresponding responses. The findings will have implications for the provision of library and information services.
E N D
Question-Answering on Yahoo!Answers: Preliminary Results Rong Tang Sheila Denn OCLC/ALISE LIS Research Grant Presentation ALISE 2009 January 23, 2009
Background • Yahoo!Answers • Social Q&A • 25+ pre-defined categories • Users post questions, answer questions, rate answers, provide comments • One best answer chosen by the asker or through vote • Users may provide comments
Our Research Project • Funded by OCLC/ALISE Grant Program and Simmons College President’s Fund for Research • Project Staff: • Rong Tang (PI) • Sheila Denn (Co-PI) • Sam Kalat (technology consultant, programmer) • Laura Saunders (Research Assistant) • The project wiki page documents the relevant literature and project progression, with extensive meeting notes on coding decisions
Research Questions • Are existing question taxonomies (such as those in Graesser et al. (1994) and Freed (1994)) valid in a social Q&A environment? • What are the relationships between the linguistic characteristics, functional properties, and subject content of the questions and the kinds of responses that they receive? • What are the characteristics of answers that are chosen as “best” answers? • What is the role of the social function vs. the information function in social Q&A? • What are the implications of the above for provision of library and information services?
Previous Research • Question classification • Wh- questions (Robinson & Rackstraw, 1972) • Conceptual question categories (Lehnert, 1978) • Content-based question categories (Graesser, et al., 1994) • Reference question classification (Pomerantz, 2005) • Questions in Dynamic Semantics (Aloni, Butler, & Dekker, 2007) • Answer classification • Much less research here than with question classification • Answer selection rules (Lehnert, 1978) • Criteria based on Yahoo!Answers comments (Kim et al., 2007)
Previous Research (cont.) • Formal studies of Online Q&A • Answerers: “specialists” vs. “synthesists” (Gazan, 2006) • Questioners: “seekers” vs. “sloths” (Gazan, 2007) • Question purpose (Graesser, et al., 1994) • Filling knowledge gaps • Establishing and monitoring common ground • Coordinating social action • Directing the conversation and controlling attention
Research Plan • Data collection and sampling • Gathered a stratified random sample of 3,000 question-answer sets, including any comments • Stratified by 25 top-level categories assigned by Yahoo!Answers • Data coding • Content analysis at multiple levels • Syntactic • Semantic • Pragmatic
Research Plan (cont.) • Data Analysis • Descriptive statistics will be produced for: • Frequency of answers provided per question • Average length of time to first answer • Distribution of subject categories • Distribution of question and answer types • Distribution of chosen answer types • Correlation analysis will be performed for: • Linguistic characteristics of questions and answers • Functional categories of questions and answers • Subject categories of questions and answers
Progress to Date • Sample has been collected • Preliminary coding has begun • Syntactic coding of questions is complete • Wh- questions • Inversion questions • Other questions • Multiparts • Double coding • Syntactic coding of question descriptions is complete • Number of questions included in description text • Type of questions
Data Coding • Two coders perform coding individually then go over the coding to reach consensus on final coding of each question • Use of informal language presents a challenge for coding • Is it a question if it doesn’t include a question mark? Is it a question simply because it has a question mark in the end? • Should “WTF” be coded a “what” question or other question? Or not at all? • Coding multiparts of a question, eg., “Why do husbands feel they have to lie to other women about being married, and when the other woman finds out?” • Double coding questions such as "Is there anywhere you can listen to citizen band radio online?"
Wh-question frequency • “What” Questions
Wh-question frequency • “Why” Questions
Wh-question frequency • “How” Questions
Wh-question frequency • “Inversion” Questions
Next Steps • Start semantic and pragmatic analysis of questions • Start answer analysis • Start comment coding • Explore the association and features of Q and A and C • Develop a conceptual and analytical model for social Q&A