240 likes | 378 Views
QA for the Web. Language Computer Corporation www.languagecomputer.com Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com. Motivation. In the US alone, there are more than 100 million Internet users per day Each user asks on average 5 questions
E N D
QA for the Web Language Computer Corporation www.languagecomputer.com Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com
Motivation • In the US alone, there are more than 100 million Internet users per day • Each user asks on average 5 questions • Each user spends about half an hour to find answers
Tasks • Task 1 – Adapt the QA technology to the universality of the Web hypertexts • Task 2 – Interface the QA system with the emerging Semantic Web technologies
Task 1 Adapt QA Technology to the Web • Two approaches: • use available Search Engines • gather documents from the Web and form a local collection
QA on Top of a Search Engine Search Engine Documents Keywords Format Manager Normalized Documents Question Processing Paragraph Retrieval Answer Processing
QA on Top of a Database Engine Database Engine Database Records Query Query Builder Format Manager Normalized Documents Keywords Question Processing Paragraph Retrieval Answer Processing
Technical challenges • Different formats: pdf, html, doc, ps • Document layout • Pages dynamically generated • Password protection • Subscription required • Cookies
Build local collections of documents • Gather documents from a specific site, and cache locally • Transform in text canonical form, then index documents • Maintain document collection: constantly update, avoid redundant documents, garbage collection, etc.
Experiments • Business: InterVoice Brite Product Manuals • Community: City of Irving • NEWS: cnn.com, abcnews.com, dallasnews.com, time.com, washingtonpost.com
InterVoiceBrite • Collection: • product manuals • size: 38MB • files: 802 • format: PDF • layout: specific to manuals • changes occur at large time intervals
PECULIARITIES OF THEIR NEEDS • The Question is in the form of a problem description • The expected answer is a solution to the problem • The answer is compiled from different parts of documents and given in the form of a procedure to be followed • Follow-ups are frequently leading to dialogue
An Example • Question: “I would like to have the caller be able to control the playback of a long set of instructions with speech recognition. While the message is playing the caller may say “stop”, “go back”, “forward”, “start over” and have the system respond appropriately. Can this be done? The SpeechAccess engine is Nuance. • Answer: “Yes this can be done. Play a lead in message to tell the caller to say “next” “backup” or “done”. Then with the loop play the first instruction you want the caller to hear in keyover mode. To obtain line balancing procedure and the required files please visit the continuing engineering web page”
Our Demo • Q: How can I obtain line balancing information ? • A: READ DSLAC Request AI1 DSLAC line balancing information • Q:How can I modify a message ? • A: Your Voice The feature that enables a voice mail user to change specific voice messages • Q: What is the runtime engine ? • A: ISINIT, the runtime engine,
Our Demo • Q: What type of error is HH ? • A: Hardware Handler (HH) error • Q: What causes telephony connection problems ? • A: Telephony connection problems can be caused by the InterSoft system or by the telephony equipment (PBX) • Q: What does FUSE mean ? • A: FUSE Indicates a problem with the fuse
City of Irving • Collection: • heterogeneous, city information • size: 96MB • files: 1097 • format: HTML, PDF, DOC • layout: WWW space • small daily changes
Examples • Q: When does the Farmer’s Market take place ? • A: Irving Farmers ‘ Market: 1st and 3rd Saturdays in Downtown Irving • Q: What is Irving ‘s news source ? • A: Irving ‘s news source is the City Spectrum • Q: Where does Irving’ s water supply come from ? • A: The City of Irving purchases its entire water supply from the City of Dallas
Examples • Q: Where can I pay traffic fines ? • A: Irving Municipal Court Criminal Justice Center 305 N. O’Connor Rd • Q: How do I apply for a job with the City ? • A: Applications are accepted from 8a.m. to 5p.m. Monday – Friday at the Civic Center Complex, 825 W. Irving Blvd. Job listings are available on the city ‘s Web site, www.ci.irving.tx.us , or by calling the city ‘s 24 –hour job line at (972) 721 3773
NEWS • Collection: • sources: CNN.COM, TIME.COM, ABCNEWS.COM, DALLASNEWS.COM, WASHINGTONPOST.COM • size: 531MB • files: 55880 • format: HTML, PDF, DOC • frequent changes
Issues • broken links • garbage collection for obsolete files • cumulative NEWS • updates depending on the type of source (TIME.COM - weekly)
Examples • Q: How many soldiers died in Afghanistan? • A: The US military has opened an investigation into last week’s friendly fire incident in Afghanistan that killed four Canadian soldiers and injured eight others • Q: How much did President Bush increase aid for poor countries ? • A: Bush said the US will increase its initial pledge of $ 200 million only after the fund proves successful • Q: Who is the owner of Dallas Mavericks ? • A: Mark Cuban, Internet entrepreneur and owner of the NBA ‘s Dallas Mavericks
QA and Semantic Web • QA Technology can contribute to the development of Semantic Web • Possible architectures: • 1. QA as an interface between Intelligent Agent and the Semantic Web Agent Human Web QA
QA and Semantic Web • 2. QA works on a local collection Local Collection QA Web Agent Human Agent Human Local Collection QA
Technical Challengesto be Addressed • 1. Make QA system compatible with semantic web language (i.e. XML, RDF, DAML, OIL, etc.) • 2. Make QA ontologies compatible with the Semantic Web ontology • 3. Interface QA system with Intelligent Agents