1 / 17

Querying Web Data – The WebQA Approach

Querying Web Data – The WebQA Approach. Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan. Agenda. Properties of Web Data Approaches of Web Data Searching WebQA Introduction

shania
Download Presentation

Querying Web Data – The WebQA Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan

  2. Agenda • Properties of Web Data • Approaches of Web Data Searching • WebQA Introduction • WebQA System Architecture • WebQA Implementation • WebQA System Evaluation • Conclusion

  3. Web Data Searching • Search Engine is Enough? • Web Data Query is Necessary?

  4. Characteristics of Web Data • Properties of Web Data • Wide distribution, large volume • High percentage of volatility • Unstructuredness, redundancy, inconsistency of redundant copies • Representation heterogeneity • Dynamism • DB Perspective Difficulties of Querying Web Data • No schema • Short of scalability in searching the whole web • No exact web query language

  5. Web Data Searching Approaches • Information Retrieval Approach • Search engine and Metasearchers • Database-Oriented Web Querying • Information Integration • Semistructured Data Querying • Special Web Query Languages • Question-Answer

  6. Question-Answer Approach • Basic principle • Web pages that could contain the answer to the user query are retrieved • The answer is extracted from these pages. • NLP and Information Retrieval (IR) technologies • Answer extracted by Information Extraction (IE) techniques. • Example Systems • Mulder[Kwork et al, 2001] • WebQA[Lam & Özsu, 2002]

  7. WebQA • Question-answer approach • Accepts short factual queries • Returns the exact answers • Aims at : • Accept fuzziness in user queries • Return actual answers, not URLs • Query entire webs and easily scale with new data sources

  8. WebQA System Architecture User Query Parser Answer Formatter Semantic Cache Manager Resource Locator/ Decomposer Cache Complex Query Evaluation Answer Collector … Search Engine Search Engine Web Data Source Web Site Reference [1]

  9. Cache WebQA Prototype Architecture User Query Parser (QP) Answer Extractor (AE) Valid WebQAL Query Keywords Category List of Ranked Records Summary Retriever (SR) Keywords / Description Keywords/ Description … Search Engine Search Engine Web Data Source Web Site Reference [1]

  10. Query Parser User query <Name, Place, Time, Quantity, Abbreviation, Weather, and Other> NL question WebQAL Categorizer WebQAL Checker NL question category WebQAL Generator Valid WebQAL Reference [1] Query Example: which country produced the most computers in the world? WebQAL Syntax: <category> [-output <output option>] -keywords <keyword list> place –output country –keywords producer most computers

  11. Record Retriever Wrapper #N Search Engine Summary Retriever WebQAL • Source Ranker identifies better data resources to answer certain types of questions. • Ranked records are based on the source ranking first and local ranking second. Keyword Generator Source Ranker List of Ranked Records Record Consolidator/Ranker Record Retriever Record Retriever Wrapper #1 Wrapper #2 Web Site Remote Database Reference [1]

  12. List of Ranked Records Candidate Retriever Rearrange Output Converter Top ten answers (user readable) Answer Extractor • Candidate is retrieved based on • word frequency of occurrence of the answer and the score of the rule that adds it to the candidate list. • The higher the score, the more likely is the candidate the answer to the user’s query. • The shorter the answer, the higher the score . Reference [1]

  13. WebQA Implementation Architecture Web Server QA Server Client #1 Question/Answer (HTTP) QA Server Thread QA Engine JSPs, HTMLs Question answer Client #2 QA Server Thread (string) Q/A . . . QA Server Thread Client #N Reference [3]

  14. System Evaluation Evaluation is using TREC-9 and measured in two aspects: accuracy and efficiency Reference [3]

  15. Conclusion • WebQA is in Question-Answer approach. • query input, exact answer • NLP, IR and IE technologies • Data schema-independent. • Query multiple Web sources: • Search engines • Data sources (CIA’ World Factbook) • Web Sites.

  16. Future work • To develop a full-fledged Web query system • Execution algorithms for more complex queries • Common aggregation functions on retrieving answers • To think about other query types • Continuous query Ex: notify me whenever the Ottawa’s temperature drops below zero • Procedural query Ex: How do I make pancakes?

  17. References • S. Lam and M.T. Özsu, "Querying Web Data - The WebQA Approach. WISE 2002. • D. Florescu and A. Levy and A. Mendelzon. Database techniques for the World Wide Web: A survey. SIGMOD Record, 27(3):59-74, 1998. • Web Data Management -Some Issues, M.T. Özsu, Course Slides

More Related