1 / 25

Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Enhancing Internet Search Engines to Achieve Concept-based Retrieval. F. Lu, T. Johnsten, V. Raghavan, and D. Traylor InForum ‘99 May 5 -6, 1999. Agenda. Information on the Internet. Boolean Retrieval Model and the Internet. Concept-Based Retrieval (RUBRIC / CS 3 ).

chyna
Download Presentation

Enhancing Internet Search Engines to Achieve Concept-based Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enhancing Internet Search Engines to Achieve Concept-based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor InForum ‘99 May 5 -6, 1999

  2. Agenda • Information on the Internet. • Boolean Retrieval Model and the Internet. • Concept-Based Retrieval (RUBRIC / CS3). • CS3 and Boolean Search Engines. • Future Work.

  3. Information on the Internet • Large volume. • Rapid growth rate. • Wide variations in quality and type.

  4. Boolean Retrieval Model and the Internet • Most Internet search engines are based on the Boolean Retrieval Model. • Boolean Retrieval Model is relatively easy to implement. • Limitations: • Inability to assign weights to query or document terms. • Inability to rank retrieved documents. • Naïve users have difficulty in using

  5. Concept-Based Retrieval • Address shortcomings of Boolean Retrieval Model. • Search Requests specified in terms of concepts structured as rule-base trees.

  6. Development of Rule-Base Trees (General) • Top-down refinement strategy. • Support for AND / OR relationships. • Support for user-defined weights.

  7. Development of Rule-Base Trees (CS3) • Concept-Set Structuring System (CS3) • CS3 supports the creation, storage and modification of user-defined concepts • Post-processing of results of sub-queries • CS3 user-interface.

  8. CS3 User Interface

  9. Evaluation of Rule-Base Trees (RUBRIC) • Run-time, bottom-up analysis. • Propagation of weight values (MIN / MAX). • Disadvantage of run-time analysis.

  10. Evaluation of Rule-Base Trees (CS3) • Static, bottom-up analysis. • Construct Minimal Term Set (MTS). • Propagation of terms. • CS3 user-interface.

  11. MTS-Minimal Term Set • A MTS for a topic is a set of terms such that if each term in the set appears in the document, the document would get a RSV larger than 0. If not, the RSV would be 0. • A topic could have more than one MTSs. • A user can choose from those MTSs to perform a search to his needs.

  12. Concept-Based Retrieval and Boolean Search Engines • CS3 is designed to interface with existing Boolean search engines. • U.S. Department of Energy’s “Information-Bridge” search engine. • U.S. Department of Transportation’s “National Transportation Library” search engine.

  13. System Architecture Client (Java/ Applet ) CORBA CGI Server (JAVA) Server (JAVA/C++) JDBC DOE InfoBridge etc. … ORACLE

  14. Information-Bridge and CS3 • Search request: Boolean Vs. Concept • Output: Non-Ranked Vs. Ranked. • Calculation of RSV: • Given a document D and a set S of MTS expressions satisfied by D, the RSV of D is equal to the sum of all the weights of S plus the maximum weight in S.

  15. Information-Bridge and CS3 (Example) • Boolean search request (“Environmental Science Network” Form): • (“Hydrogeology” OR “Dnapl” OR (“Colloid*” AND “Environmental Transport”)). • Concept (CS3): • “Hydrogeology”. • Rule-Base Tree.

  16. CS3 Hydrogeology Rule Base

  17. CS3 search results

  18. Current and Future Work • Conduct experiments to evaluate effectiveness (future). • Investigate alternative methods to compute RSVs [KADR00, KDR01*]. • Learning edge weights through relevanace feedback [KR00]. • Thesaurii based rulebase generation [KLR00].

  19. Relevant URLs www.cacs.usl.edu/~linc-projects/cs3/ [LJRT99*] RaghavanHome  Publications since 1991

More Related