230 likes | 339 Views
Full Text Indexing Based On Lexical Relations An Application :Software Library by YS Maarek and F.A. Smadja. Presented by: AKHIL GADA CSCI 572 University of Southern California. REQUIREMENT FOR SEARCH IN SOFTWARE LIBRARY. SEARCH FOR FUNCTIONALLY SIMILAR COMPONENTS.
E N D
Full Text Indexing Based On Lexical Relations An Application :Software Library by YS Maarek and F.A. Smadja Presented by: AKHIL GADA CSCI 572 University of Southern California
REQUIREMENT FOR SEARCH IN SOFTWARE LIBRARY • SEARCH FOR FUNCTIONALLY SIMILAR COMPONENTS • E.g. Yahoo Search API and Google Search API for query “I want to search pages” Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
A.I. OR Knowledge Base Approach I.R. OR Free Text Based Approach ENTER DOMAIN KNOWLEDGE NO PRIOR KNOWLEDGE REQUIRED MANUAL OR SEMI-AUTOMATIC COMPLETELY AUTOMATIC SPECIFIC AND DIFFICULT TO SCALE TO NEW DOMAIN GENERIC AND VERRY EASY TO SCALE TO NEW DOMAIN SEMANTIC UNDERSTANDING OF DOCUMENTS NO SEMANTIC UNDERSTANDING OF DOCUMENTS Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
SINGLE KEYWORD LEXICAL RELATION CONTEXT INFORMATION IS LOST E.g. Apple Fruit VS Apple Computers REVEALS CONTEXT INFORMATION VS HIGH FREQUENCY OF LEXICAL TERM PROVIDES HIGH FUNCTIONAL INFORMATION OF DOCUMENT E.g. Word “Copy File” in UNIX HIGH FREQUENCY GENERIC TERMS MIGHT INTRODUCE NOISE . E.g. Word “File” in UNIX manual does not characterize the functionality of any command Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
CLUSTERING IR USING HAC(Hierarchical Agglomerative Clustering) LINEAR IR USING INVERTED INDEX Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
LEXICAL RELATIONS • TWO WORDS IN A SENTENCE HAVING SYNTACTIC RELATIONSHIP BETWEEN THEM : Subject-Verb, Verb-Direct object , Verb-Indirect object, etc • OPEN CLASS WORD – NOUNS,ADJECTIVE,ADVERBS ARE MEANING BEARING . • CLOSED CLASS WORD – Conjunctions(and, or), Articles (the, a), Demonstratives (this, that), and Prepositions (to, from, at, with). Does not convey any Meaning to sentence Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
EXTRACT [1] LEXICAL RELATIONS ALGO.[2] W1 W2 W3 W4 W5 5 – Word Window Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
EXTRACT [1] LEXICAL RELATIONS ALGO. [2] Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
EXTRACT [1] LEXICAL RELATIONS ALGO. [2] Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
OUTPUT FROM EXTRACT [1] ALGORITHM. [0] RESOLVING POWER Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
CREATE INVERTED INDEX . [2] SELECT TOP N INFORMATIVE (RESOLVING POWER)LEXICAL RELATION FOR EACH DOCUMENT FORMING PROFILE FOR THE DOCUMENT . Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
SIMILARITY MEASURE BETWEEN TWO DOCUMENTS [2] dx ∂(dx,dy) dy • LET X = set of top N resolving power lexical relations for document dx • Y = set of top N resolving power lexical relations for document dy • (X ∩ Y) = Set of Lexical Relations Common Between dx and dy ∂(dx,dy)= ∑Vi€(X ∩ Y) (Pi(dx)*Pi(dy)) Where: Pi(dx) and Pi(dy) =Resolving Power Of lexical Relation - iw.r.t. document dx and dy respectively Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
CLUSTER SIMILAR FUNCTIONAL COMPONENTS USING HIERARCHICAL AGGLOMERATIVE CLUSTERING[2] ∂({d3,d4},{d5}) ∂({d3},{d4}) ∂({d1},{d2}) {d5} {d4} {d1} {d2} {d3} Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
INFORMATION RETRIEVAL[2] USER SPECIFY FREE TEXT QUERY SEARCH AND RETURN RESULTS - LINEAR I.R. USING INVERTED INDEX USER SATISFIED ?? NO ALLOW USER TO TRAVERSE THROUGH CLUSTERED HIERARCHY Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
LINEAR INFORMATION RETRIEVAL[2] d1 ∂(dq,d1) d2 ∂(dq,d2) dq ∂(dq,dn) dn Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
GURU : WORKING SYSTEM SNAPSHOT [2] Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
EVALUATION[2] MAINTENANCE COST : INCREMENTAL INSERTION [3] OF NEW COMPONENTS IS EASY EFFICIENCY: 2.5 secs on RT ;0.15 secs on IBM RISC for query containing 5 to 15 Lexical Relation RETRIEVAL EFFECTIVENESS : Contd… Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
EVALUATION Precision-Recall Curve[ 2] If c = Total number of records retrieved after executing query q R= Total Number of expected correct result - Determined before query is executed. r = Total number of correct result retrieved after executing query q. Then Recall = r/R Prescision= r/c Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
PROS: • VERY SIMPLE AND ELEGANT APPROACH • EASY TO EXTEND TO ANY DOMAIN i.e. GENERIC APPROACH • PAPER ADEQUATELY PROVIDED BACKGROUND BY DESCRIBING PAST RESEARCH Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
CONS: • May fail in following case • E.g. ‘xcalc’ and ‘bc’ Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
FURTHER RESEARCH: • COMBINE KNOWLEDGE BASE APPROACH WITH THIS TECHNIQUE • e.g. Knowledge bc=calculator can be added to GURU to increase recall. • IMPROVED ALGORITHMS FOR INCREMENTAL UPDATION OF INDICES . Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
References • 0 - Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries by Yoelle S. Maarek, Frank A Smadja • 1 - F. De Saussure, Cours de Linguistique Geaerale, Qualridme edition. Librairie Payot, Paris, France, 1949. • 2 – GURU-Information Retrieval For Reuse - Y S. Maarek,Deniel M Berry,Gail E . Kaiser. • 3 - Kaplan and Maarek, 1990: Incremental maintenance of semantic links in dynamically changing hypertext systems .Interacting with Computers Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
Q & A Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries