370 likes | 483 Views
Evaluating the Habitability of Q&A With User-Generated Tasks. Bill Ogden Ron Zacharski Jim McDonald Roger Chadwick New Mexico State University. Habitability. Watt (1968)
E N D
Evaluating the Habitability of Q&A With User-Generated Tasks Bill Ogden Ron Zacharski Jim McDonald Roger Chadwick New Mexico State University
Habitability • Watt (1968) • A language is considered habitable if users can express everything that is needed for a task using language they would expect the system to understand. • If there are 26 ways that a user population would be likely to ask a question, a habitable system will process all 26. • Goal is to improve NL query systems by achieving habitability
How to Achieve Habitability • Discover all ways a user population will likely ask a question • Will depend on: • User characteristics, knowledge, expectations • Subject Domain Knowledge • Past experience w/ Q&A • Perceived capability of the system • Interface presentation, visibility • Feedback, error handling
Evaluating Interactive IR and Q&A • “You are not sure about the safety of genetically engineered foods, and would like to find more information and research on this topic. Name four potential types of safety problems that have been raised.” • 8 tasks, 6-12 users, 2-3 sessions. • Recorded screen and voice.
Evaluating Interactive IR and Q&A • Using surrogate users/tasks may not capture ‘real’ Q&A user behavior • We are looking for ways to observe users who are working on their own questions. • Lack of control will be offset by richness of behavior
How many users? • Typical usability evaluation • Defined task are given to representative users • Widely held theory that five users will detect almost all software usability problems. Jakob Nielsen and Thomas Landauer (1993) • Recent usability evaluation for a web CD shopping site suggest otherwise. Perfetti and Landesman http://www.uie.com/Articles/eight_is_not_enough.htm • 18 users each with a list of CDs they wanted to purchase, found 247 total obstacles-to-purchase with only 35% in first five users. • Self-generated tasks leads to more discovery.
Self-generated Tasks • Provide broader coverage for habitability studies. • Many more usability issues will emerge • Reflect real information needs • Motivates test participants to find real conclusions. (no artificial satisfaction) • Demonstrates query drift (information need changes through searching)
Problems With Self-generated Tasks • System comparisons are difficult • But user and task variability in user studies make system comparisons difficult anyway • Users will generate questions outside of the system’s capabilities • But isn’t this the point of habitability testing?
Our New Study • Users are asked to generate information needs • 8 things they think they would like to know that may be on the web. • User is assigned to LCC web demo • User session is recorded. • Screen video, think aloud audio, automatic captured query and web result.
Participants (LCC) • 7 Graduate Psychology Students at NMSU. • 6 Female and 1 Male, ages 18 to 37. • Experience with computers / search engines: • mean exp using computers 6.6 (1-7 scale) • Mean self rated computer expertise: 5.1 (1-7 scale) • mean exp using W.W.W. 6.6 (1-7 scale) • mean frequency of computer use: 6.6 (1-7 scale) • mean years of online search exp. : 7.8 yrs • mean rated success at searching : 6.0 (1-7 scale) • mean years of schooling: 17.5 yrs (3:BA, 4:MA)
Questionnaire How satisfied are you with the results of your search? In other words, to what extent did this search answer your question?" Dissatisfied Satisfied 1 2 3 4 5 6 7 How useful did you find the retrieval system you used to accomplish your search? In other words, to what extent do you feel the retrieval system helped you accomplish your goal? Keep in mind that you could be Dissatisfied with your results because you feel the Internet simply doesn't contain an answer to your question, and still find the retrieval system Useful Not Useful Useful 1 2 3 4 5 6 7 Did you change your question as the search progressed? In other words, did it become narrower (more focused), stay the same, or broader over time? Narrower Broader 1 2 3 4 5 6 7
User generated questions What is the proposed smoking ordinance in Las Cruces? What is the Senate reaction to the phone-in campaign? How can I find out more about the AIDS vaccine? How do people around the world feel about the impending war? Can I find a map of the Middle East? How does an individual qualify for the NCCA National Championships? Which Senators are opposed to the War? Who died in the Korean Subway Fire? Spec. K.W.R [initials] What is tuition at all state universities? Have any new planets been discovered? How can I stop my cat from clawing furniture? When will Apple release the new Powermac? From where can I order out of print records? Did the Bruins win today? Is there an instrument store within 10 miles? Where is the nearest comic book store? All results published on Web
Focus question for LCC interface • Does LCC provide cues to input NL query? • Two subjects used NL throughout • Two subjects used keyword mostly • Two subjects started out with NL but switched to keyword • One used NL with an occasional switch to keyword
Participant 2 – 1st Question • Need: • What is tuition at all state universities? • Queries: • What is tuition at all state universities? • List of tuition costs at U.S. Universities • What are the tuition costs at state universities in the united states? • Does consumer reports list college tuition? • College ranking by tuition • Is a college ranking list published?
Observations. • People seem to struggle to think of a NL question they can easily express with keywords. • One users said it was like playing Jeopardy • Time critical questions predominated • but the LCC demo had no time processing capability.
Controlling query function with NL • Controlling the date of the information being reported. • Example 1. Saddam alive • also example of benefits of user-generated tasks • Example 2. Did the Bruins win
Highest satisfaction ratings Is aspartame bad for your health? Can you rent a house boat in NM? Weather in St. Petersburg Russia What is the value of my year old car? How often do I need to water roses? What are some of the theatrical performances coming to NMSU this semester? Where is Westminster Colorado? How do I get a passport?
Lowest satisfaction ratings. Is Sadaam alive? Easiest ways to lose weight. What is the name of the new Acura? What new games are out on X-box? What are the additional requirements that I need to fulfill for a PhD at NMSU? What is some of the new information on the new "female viagra" (ie elevil)? What is the treasury forcast for interest rates? When does the NCAA tournament start?
Other Successful questions • Have any new planets been discovered? • "no we still have the same 9" • How can I stop my cat from clawing furniture? • "provide your cat with a scratching post."
Spell checking in LCC demo • Works well in some cases • sadam • But confusing in others • el paso, tx
The NL Interface Quandary • What do users of NL systems need to know? • If you need to train NL what is the point?
One user’s comments. “You have to think too much about what you are looking for" "I'm not used to it, with other search engines..[you don't have to think so much]" “I can't figure out the system yet, with other engines..." “After a few trials you can figure out what they want from you" “I can't pinpoint the best search technique" “Like when you use MSN, it has much more demands that yahoo “Yahoo you can type anything and you get results" "I can feel what the system wants from me, here i don't get a feeling"
"Which features of the retrieval system made it more useful?" • Question based queries • Short descriptions [of results] • Nothing • The natural language method was nice but it didn't work well. • You could type phrases • None • Suggestions to misspellings • The system seemed capable of handling fairly narrow searches via the use of questions instead of just phrases.
"Were there any features of the retrieval system that could be improved?" • Organize the results chronologically or alphabetically • Avoid chat groups and personal emails [subjects did not like results that were not credible] • Bold the keywords • Include the website for each result [done] • Have a general menu with topics like "education" • Add [instructions ?] that you can use keywords not just questions. • No system feedback when you click on a link. • Didn't like that a new window opens • Web address not given to you in results [done] • Broader searches • No repeat websites • Have it look for keywords, not the question.
Too Early? • Is the Q&A technology too primitive to worry about the user interface?
Conclusions • Real information needs are often different from the needs expressed in the original question • NL query could be more useful if users knew when or how it could be used. • The user-generated tasks approach directly addresses the goal of improving the habitability of Q&A systems.
Project Goals for Collaboration • Identify the characteristics of habitable Aquaint systems • Use prototype Aquaint systems for iterative formative evaluation
Questions? Please form your question in Natural Language
User generated questions What is some of the new information on the new "female viagra" (ie elevil)? What are the additional requirements that I need to fulfill for a PhD at NMSU? How safe is the "sponge" and why was it taken off of the market in the first place? What are some recent findings regarding gender stereotype enforcement in advertising? What are some of the theatrical performances coming to NMSU this semester? What are the upcoming tour dates for Ani Difranco? Are there any lab retrievers for sale near Indiana for my dad? How many hours are required for an M.A. in rehab.counseling at SFSU? How far in the universe have we studied? Does Venus have water? When did humans arrive in the U.S.? What re reasons my feet hurt? Where can I take flying lessons? What would someone in an African tribe do on a daily basis? How do Americans compare to African tribes about love? How do USA feel about war with Iraq?
User generated questions What classes and labs does UNM have to offer for a PhD in BioPsyc? Are there relevant articles on parental bonding and advise on data collection? How do you paint tile and waterproof it? How many grams of protein does a woman of my height and activity level need? What new exercise can I do for m y lower back? Is there a recipe for Rosemary Creme' Brulee? Is there a link between anthrax and cancer? What special prices are offered on gifts for Mother's Day? Where can I find info about visiting Alaska? What schools have PhD programs in HF? [human factors] When does the NCAA tournament start? What is playing at the Met right now? Where can I find info about bands touring schedules? Where can I find info about uses of VR? [virtual reality] Can you rent a house boat in NM? When does the new Matrix movie come out?
LCC user 6 Where can I find info about visiting Alaska? 1.1 where can I find out information about visiting alaska? 1.2 where does the alaska ferry leave from? 1.3 where does the Alaska Marine Highway System depart from? 1.4 where in alaska is denali national park? What schools have Phd programs in human factors? 2.1 what schools have Phd programs in human factors? 2.2 what graduate schools have engineering psychology programs? When does the NCAA tournament start? 3.1 when does the NCAA tournament start? 3.2 when does the men's NCAA basketball tournament start? 3.3 when does the men's NCAA basketball tournament start in 2003? What is on display at the met right now? 4.1 what is on display at the metropitan museum of art? 4.2 what is on display in March 2003 at the metropitan museum of art in new york? 4.3 what is on display in March 2003 at the metropolitan museum of art in new york?
LCC user 6 (cont) Where can I find information about band's touring schedules? 5.1 where can I find information about bands touring schedules? 5.2 what bands are playing in new york city in march 2003? 5.3 who is playing at the bowery blallroom in march 2003? 5.4 who is playing at the bowery ballroom in march 2003? Where can I find information about the uses of virtual reality? 6.1 where can I find information about the uses of virtual reality? Can you rent a houseboat in new mexico? 7.1 can you rent a house boat in New Mexico? When does the new matrix movie come out? 8.1 when does the new Matrix movie come out? 8.2 what does the matrix reloaded come out? 8.3 when is the matrix reloaded opening?
Same Person. 8th Question. • Need: • Where is the nearest comic book store • Queries: • Is there a comic book store in Las Cruces nm? • List comic book stores in New Mexico • Comic book store Las Cruces NM
Participant 3 – 3rd Question • Need • How safe is the "sponge" and why was it taken off of the market in the first place? • Queries • contraceptives sponge safety • Why was Today Sponge taken off the market?
Participant 3 – 7th Question • Need: • Are there any lab retrievers for sale near Indiana for my dad? • Queries: • laboratory retriever breeders midwest • breeders laboratory retrievers • laboratory retrievers • purchasing laboratory retrievers LCC does well with ‘lab retrievers’