170 likes | 317 Views
Building Resources for an Open Task on Question Generation. Mihai Lintean. Outline. Background Proposed Tasks Building the Data Set Conclusions. Background. 1 st Workshop on The Question Generation Shared Task and Evaluation Challenge (September 2008, Arlington, VA)
E N D
Building Resources for an Open Task on Question Generation MihaiLintean
Outline • Background • Proposed Tasks • Building the Data Set • Conclusions
Background • 1st Workshop on The Question Generation Shared Task and Evaluation Challenge (September 2008, Arlington, VA) • The Text-to-Question task : generate good questions for which a given input text (raw or annotated) contains, implies, or needs answers
Proposed Tasks • Open Task: • Identify the question • Input: paragraph of an answer to an unknown posed question • Example: • Output: one or more questions for which the input paragraph is/contains/implies the answer When you open up your recording options you need to select what input you are using for recording. assuming you have loaded the driver software for your edirol...> open control panel...>open sounds and audio devices....>select the audio tab...>select the edirol device for recording... > select the volume tab...> select advanced... make sure the microphone tab is not muted and the volume is up How do I make my microphone work when conecting it to a recording interface?
Proposed Tasks (cont) • Subtask (more easy): • Identify the question type • Input: same as above • Output: • What is the most likely type of question that was posed • For given example: question type is How • Use of community Question Answering sources to build a high quality dataset • Yahoo!Answers, WikiAnswers, etc
Data Collection • The dataset should be comprised of Q-A pairs (answers with their associated questions) • Collecting the data: • Identify (or create) a good source for efficient collection of instances • Automatically collect Q-A pairs • Automatically filter the Q-A pairs • Manually filter for high quality Q-A pairs
1. Identify a Good Data Source • Yahoo!Answers • Contains open domain general knowledge content • Both questions and associated answers are available • An API interface is available to collect Q-A pairs • Microsoft offers a similar service • Wikipedia was another source generally preferred in the QG community • Offers huge amounts of open-domain/general knowledge • problem is that only the text is available with no associated questions
2. Automatic Collection of Q-A pairs • The goal at this step is to collect a large number of questions and answers that could later be filtered for high quality instances • Yahoo!Answers • Questions are grouped in categories (e.g. Allergies, Dogs, Software, Garden and Landscape) – 244 were selected • For each category try to extract 150 questions for each of the 6 types of questions • To query for each type we look for the wh-words that define the question type (how, who, what, when, where, and why) • The maximum # of questions that could be retrieved: 150*244 * 6 = 219.000 (actual number is a bit smaller)
3. Automatic Filtering • Purpose: try to automatically eliminate “easy” instances that are not appropriate: • question length ( min 3 words: What is <object>?) • answer length ( min 10 words) • presence of curse words (e.g. […]), words that refer to sexual explicitness (e.g. […] and words that are ethically intolerant
4. High-Quality Manual Filtering • Use real raters to manually filter the dataset for high-quality instances • We built a software tool to ease the work • Easy relabeling and removal of inacceptable instances (incorrect, improper, too difficult) • The task is mentally challenging • It take on average about 10 hours to extract 100 good instances • On average about 10% of candidate instances are retained (for some question types this ratio is even lower - ~2%)
XML Output from Yahoo!Answers API Modified XML to accommodate for the shared task <FilteredQuestionsSavedFrom="LONEMC-PC"> <Question id="20090104084819AAR971h"> <Subject>How do you back all songs in iTunes onto an external storage device?</Subject> <Content>I have a Kingston 4gb device which I heard I can back up all my songs in itunes to. That also includes songs from CDs and songs from neither Cds or the itunes store. </Content> <Date>2009-01-04</Date> <Category>Add-ons</Category> <ChosenAnswer>[…]</ChosenAnswer> <Type>how</Type> <needReview>False</needReview> </Question> […] <Question id="20090104084819AAR971h" type="Answered"> <Subject>How do you back all songs in iTunes onto an external storage device?</Subject> <Content>I have a Kingston 4gb device which I heard I can back up all my songs in itunes to. That also includes songs from CDs and songs from neither Cds or the itunes store. </Content> <Date>2009-01-04 08:48:19</Date> <Timestamp>1231087699</Timestamp> <Link>http://answers.yahoo.com/question/?qid=20090104084819AAR971h</Link> <Category id="396545669">Add-ons</Category> <UserId>XLu9yWRXaa</UserId> <UserNick>ORP</UserNick> <UserPhotoURL>http://…/photo3_48x48.gif</UserPhotoURL> <NumAnswers>4</NumAnswers> <NumComments>0</NumComments> <ChosenAnswer>[…]</ChosenAnswer> <ChosenAnswererId>ECxLbx31aa</ChosenAnswererId> <ChosenAnswererNick>Abatage</ChosenAnswererNick> <ChosenAnswerTimestamp>1231089203</ChosenAnswerTimestamp> <ChosenAnswerAwardTimestamp>1231607105</ChosenAnswerAwardTimestamp> </Question>
Conclusions and Future Work • We’ve collected so far around 500 Question-Answer clean instances that could be used in a shared task • We plan to extend this initial dataset to 5000 , but manual filtering requires a lot of work • Further validation of the dataset is planned (test if human subjects choose the same question type as contained in the dataset) • Further annotate the texts with deep representations (e.g. PropBank) in order to ease the task • Collecting data from community QA sources is more challenging than it seems • It becomes comparable to collecting data from sources of general interest (such as Wikipedia)
Evaluation Criteria • For proposed subtask (identify question task) • Straight Forward • Accuracy: Percentage of correct guesses • For question type: 6 possible answers • For the more general task • Evaluation is a bit tricky • Textual similarity (LSA, simple N-gram overlap) between the system answer and the gold answer (a list of correct questions)
Some Rules on Filtering Q-A Pairs • Question is a compound question • How do you figure out who your video card manufacturer is? • Question is not in interrogative form • I want a webcam and headset etc to chat to my friends who moved away? • Question is in poor grammar or spelling • Yo peeps who kno about comps take a look? and Who the memory eaten is bigger? • Question does not solicit a reasonable answer for our purposes • Who knows something about digital cameras? • Question is ill-posed • When did the ancient city of Mesopotamia flourish? (Answer: Mesopotamia wasn't a city. )