170 likes | 262 Views
Querying Web Data – The WebQA Approach. Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan. Agenda. Properties of Web Data Approaches of Web Data Searching WebQA Introduction
E N D
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan
Agenda • Properties of Web Data • Approaches of Web Data Searching • WebQA Introduction • WebQA System Architecture • WebQA Implementation • WebQA System Evaluation • Conclusion
Web Data Searching • Search Engine is Enough? • Web Data Query is Necessary?
Characteristics of Web Data • Properties of Web Data • Wide distribution, large volume • High percentage of volatility • Unstructuredness, redundancy, inconsistency of redundant copies • Representation heterogeneity • Dynamism • DB Perspective Difficulties of Querying Web Data • No schema • Short of scalability in searching the whole web • No exact web query language
Web Data Searching Approaches • Information Retrieval Approach • Search engine and Metasearchers • Database-Oriented Web Querying • Information Integration • Semistructured Data Querying • Special Web Query Languages • Question-Answer
Question-Answer Approach • Basic principle • Web pages that could contain the answer to the user query are retrieved • The answer is extracted from these pages. • NLP and Information Retrieval (IR) technologies • Answer extracted by Information Extraction (IE) techniques. • Example Systems • Mulder[Kwork et al, 2001] • WebQA[Lam & Özsu, 2002]
WebQA • Question-answer approach • Accepts short factual queries • Returns the exact answers • Aims at : • Accept fuzziness in user queries • Return actual answers, not URLs • Query entire webs and easily scale with new data sources
WebQA System Architecture User Query Parser Answer Formatter Semantic Cache Manager Resource Locator/ Decomposer Cache Complex Query Evaluation Answer Collector … Search Engine Search Engine Web Data Source Web Site Reference [1]
Cache WebQA Prototype Architecture User Query Parser (QP) Answer Extractor (AE) Valid WebQAL Query Keywords Category List of Ranked Records Summary Retriever (SR) Keywords / Description Keywords/ Description … Search Engine Search Engine Web Data Source Web Site Reference [1]
Query Parser User query <Name, Place, Time, Quantity, Abbreviation, Weather, and Other> NL question WebQAL Categorizer WebQAL Checker NL question category WebQAL Generator Valid WebQAL Reference [1] Query Example: which country produced the most computers in the world? WebQAL Syntax: <category> [-output <output option>] -keywords <keyword list> place –output country –keywords producer most computers
Record Retriever Wrapper #N Search Engine Summary Retriever WebQAL • Source Ranker identifies better data resources to answer certain types of questions. • Ranked records are based on the source ranking first and local ranking second. Keyword Generator Source Ranker List of Ranked Records Record Consolidator/Ranker Record Retriever Record Retriever Wrapper #1 Wrapper #2 Web Site Remote Database Reference [1]
List of Ranked Records Candidate Retriever Rearrange Output Converter Top ten answers (user readable) Answer Extractor • Candidate is retrieved based on • word frequency of occurrence of the answer and the score of the rule that adds it to the candidate list. • The higher the score, the more likely is the candidate the answer to the user’s query. • The shorter the answer, the higher the score . Reference [1]
WebQA Implementation Architecture Web Server QA Server Client #1 Question/Answer (HTTP) QA Server Thread QA Engine JSPs, HTMLs Question answer Client #2 QA Server Thread (string) Q/A . . . QA Server Thread Client #N Reference [3]
System Evaluation Evaluation is using TREC-9 and measured in two aspects: accuracy and efficiency Reference [3]
Conclusion • WebQA is in Question-Answer approach. • query input, exact answer • NLP, IR and IE technologies • Data schema-independent. • Query multiple Web sources: • Search engines • Data sources (CIA’ World Factbook) • Web Sites.
Future work • To develop a full-fledged Web query system • Execution algorithms for more complex queries • Common aggregation functions on retrieving answers • To think about other query types • Continuous query Ex: notify me whenever the Ottawa’s temperature drops below zero • Procedural query Ex: How do I make pancakes?
References • S. Lam and M.T. Özsu, "Querying Web Data - The WebQA Approach. WISE 2002. • D. Florescu and A. Levy and A. Mendelzon. Database techniques for the World Wide Web: A survey. SIGMOD Record, 27(3):59-74, 1998. • Web Data Management -Some Issues, M.T. Özsu, Course Slides