1 / 34

FACT: A Learning Based Web Query Processing System

This paper presents FACT, a prototype system for learning-based web query processing. It aims to relieve users from the tedious browsing process by accurately and concisely presenting query results. The system analyzes user queries, learns from their browsing behavior, and retrieves only the queried segments containing the desired information. The paper outlines the system's architecture, learning process, training strategies, and preliminary evaluation.

thomaslaura
Download Presentation

FACT: A Learning Based Web Query Processing System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University

  2. Outline • Introduction • Learning Based Web Query Processing • FACT: A Prototype System • Preliminary System Evaluation • Conclusions

  3. How Do We Query the Web? • Use a search engine • Form query key words • An example: Find room rates of hotels in Hong Kong • used search engine www.yahoo.com • keywords: Hong Kong+hotel

  4. forward Hotel 1 3 Hotel 2 Look at the Number! done

  5. Query the Web -- Current Situation • Search engines return a long list of URLs. User is required to browse the web pages to find the information. • The information required is often not on the returned page -- navigation through hyperlinks is often required (those links may or may not that obvious). • The target information is in different forms (paragraphs, lists, tables …) • A lot of web pages to be browsed Are we happy with this?

  6. Efforts to Improve the Situation • Search engines • better index, improve precision/recall, metasearch engines, better presentation of results, …. • IR techniques to Web • document clustering/indexing, better model, similarity functions, documents ranking, ... • Intelligent agent • user profiling, hyperlink recommendation, ... • Database approach • wrappers, query languages, …

  7. Our Dream • Querying the Web as easy as querying a relational database • SQL query returns a table of hotel prices SELECT room rates FROM web.hotel WHERE city = “hong kong” • May remain a dream for a while :-(

  8. A Practical goal • Use keywords to express query requirements • simple, no need to know schema of data • inaccurate • Relieve users from tedious browsing as much as possible • Not URLs, not Web sites, even not Web pages • Present query results to users as accurate and concise as possible • Tables, lists, paragraphs, … containing user required information

  9. Query Results -- Queried Segments • Return query results as accurate and concise as possible. • Basic idea: • Breaking a Web page into segments: a row in a table, a table, an item in a list, a list, a paragraph, • returning only queried segments to users • queried segments : segments that contain the information the user is interested in .

  10. Outline • Introduction • Learning Based Web Query Processing • FACT: A Prototype System • Preliminary System Evaluation • Conclusions

  11. Learning Based Query Processing • The fundamental difficulties in Web query processing: • Web is a huge, ever growing, heterogeneous, semi-structured data source • Most users of Web are naïve users issuing ad hoc queries • Learn the knowledge for query processing from the User!

  12. A Learning Based Technique • Learn from the user when he browses from the first few URLs • to navigate through the web pages • to identifythe required information in a web page • Process the rest URLs automatically and retrieve queried segments

  13. forward Hotel 1 3 Hotel 2 User browses it! done

  14. Back User clicks here!

  15. Room information User marks it!

  16. back Fact starts here!

  17. roomrates Fact chooses it!

  18. xxx Fact finds it!

  19. Outline • Introduction • Learning Based Web Query Processing • FACT: A Prototype System • Preliminary System Evaluation • Conclusions

  20. A Query Processing System A learning based query processing system: • User Interface: accepts user queries, presents query results, a browser capable of capturing user actions • Query Analyzer: analyzes and transforms user queries • Session Controller: coordinates learning and locating • Learner: generates knowledge from captured user actions • Locator: applies knowledge and locates query results • Crawler & Parser: retrieves pages and parses to trees • Knowledge Base: stores learned knowledge

  21. User User Interface Learner KnowledgeBase SessionController QueryAnalyzer Locator Crawler & Parser SearchEngine Web Reference Architecture

  22. Learning Process Scripts Learner Browser User Actions SessionController URLs KnowledgeBase ResultBuffer TrainingStrategy SegmentGraph Queryresults Checking Locating Process Locator Query Result Presenter A Query Session

  23. Training Strategies • Sequential • First nsites: user browses and system learns • Next N-n sites: system processes • Random • Randomly choose n sites: user browses and system learns • the system processes the rest • Interleaved • First n0sites, user browses and system learns • Next n - n0site, system makes decision. For incorrect ones, user browses and system re-learns • Next N-n sites: system processes

  24. Outline • Introduction • Learning Based Web Query Processing • FACT: A Prototype System • Preliminary System Evaluation • Conclusions

  25. System Evaluation • Functionality • Performance • precision, recall, correctness • efficiency: in a site, how many pages the system visits to find a result • training efficiency: how many training samples are needed • User interface

  26. System Evaluation - Effectiveness • Given a set of keywords, the system makes N decisions N =N1 + N2 + N3 + N4 Precision = N1 / (N1+N3) , Recall = N1 / # relevant sites , Correctness = (N1+N2) / N .

  27. System Evaluation - Efficiency • How efficiently the system finds a queried segment in a site? Level of a Queried Segment = the length of the shortest path to find it Absolute Path length = # Crawled pages, Relative Path Length = # Crawled pages / Level of the Queried Segment .

  28. Basic Performance • Q11: Hong Hong Hotel Room Rate • Q12: Hong Kong Hotel Sequential training

  29. Query Q12 Effects of training Strategies

  30. Improved Performance Interleaved training

  31. Outline • Introduction • Learning Based Web Query Processing • FACT: A Prototype System • Preliminary System Evaluation • Conclusions

  32. Conclusions • Proposed and implemented learning based Web query processing with the following features • Returning succinct results: segments of pages; • No a prior knowledge or preprocessing, suited for ad hoc queries; • exploiting page formatting and linkage information simultaneously. • The preliminary results are promising

  33. Future Work • Better knowledge • key factor that affects system performance • Dynamic web pages ? • Integrating results from another project • System evaluation • Prototype  product  dot com company $$$ ???

More Related