250 likes | 388 Views
Enhancing Internet Search Engines to Achieve Concept-based Retrieval. F. Lu, T. Johnsten, V. Raghavan, and D. Traylor InForum ‘99 May 5 -6, 1999. Agenda. Information on the Internet. Boolean Retrieval Model and the Internet. Concept-Based Retrieval (RUBRIC / CS 3 ).
E N D
Enhancing Internet Search Engines to Achieve Concept-based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor InForum ‘99 May 5 -6, 1999
Agenda • Information on the Internet. • Boolean Retrieval Model and the Internet. • Concept-Based Retrieval (RUBRIC / CS3). • CS3 and Boolean Search Engines. • Future Work.
Information on the Internet • Large volume. • Rapid growth rate. • Wide variations in quality and type.
Boolean Retrieval Model and the Internet • Most Internet search engines are based on the Boolean Retrieval Model. • Boolean Retrieval Model is relatively easy to implement. • Limitations: • Inability to assign weights to query or document terms. • Inability to rank retrieved documents. • Naïve users have difficulty in using
Concept-Based Retrieval • Address shortcomings of Boolean Retrieval Model. • Search Requests specified in terms of concepts structured as rule-base trees.
Development of Rule-Base Trees (General) • Top-down refinement strategy. • Support for AND / OR relationships. • Support for user-defined weights.
Development of Rule-Base Trees (CS3) • Concept-Set Structuring System (CS3) • CS3 supports the creation, storage and modification of user-defined concepts • Post-processing of results of sub-queries • CS3 user-interface.
Evaluation of Rule-Base Trees (RUBRIC) • Run-time, bottom-up analysis. • Propagation of weight values (MIN / MAX). • Disadvantage of run-time analysis.
Evaluation of Rule-Base Trees (CS3) • Static, bottom-up analysis. • Construct Minimal Term Set (MTS). • Propagation of terms. • CS3 user-interface.
MTS-Minimal Term Set • A MTS for a topic is a set of terms such that if each term in the set appears in the document, the document would get a RSV larger than 0. If not, the RSV would be 0. • A topic could have more than one MTSs. • A user can choose from those MTSs to perform a search to his needs.
Concept-Based Retrieval and Boolean Search Engines • CS3 is designed to interface with existing Boolean search engines. • U.S. Department of Energy’s “Information-Bridge” search engine. • U.S. Department of Transportation’s “National Transportation Library” search engine.
System Architecture Client (Java/ Applet ) CORBA CGI Server (JAVA) Server (JAVA/C++) JDBC DOE InfoBridge etc. … ORACLE
Information-Bridge and CS3 • Search request: Boolean Vs. Concept • Output: Non-Ranked Vs. Ranked. • Calculation of RSV: • Given a document D and a set S of MTS expressions satisfied by D, the RSV of D is equal to the sum of all the weights of S plus the maximum weight in S.
Information-Bridge and CS3 (Example) • Boolean search request (“Environmental Science Network” Form): • (“Hydrogeology” OR “Dnapl” OR (“Colloid*” AND “Environmental Transport”)). • Concept (CS3): • “Hydrogeology”. • Rule-Base Tree.
Current and Future Work • Conduct experiments to evaluate effectiveness (future). • Investigate alternative methods to compute RSVs [KADR00, KDR01*]. • Learning edge weights through relevanace feedback [KR00]. • Thesaurii based rulebase generation [KLR00].
Relevant URLs www.cacs.usl.edu/~linc-projects/cs3/ [LJRT99*] RaghavanHome Publications since 1991