1 / 23

Presented by: AKHIL GADA CSCI 572 University of Southern California

Full Text Indexing Based On Lexical Relations An Application :Software Library by YS Maarek and F.A. Smadja. Presented by: AKHIL GADA CSCI 572 University of Southern California. REQUIREMENT FOR SEARCH IN SOFTWARE LIBRARY. SEARCH FOR FUNCTIONALLY SIMILAR COMPONENTS.

teige
Download Presentation

Presented by: AKHIL GADA CSCI 572 University of Southern California

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Full Text Indexing Based On Lexical Relations An Application :Software Library by YS Maarek and F.A. Smadja Presented by: AKHIL GADA CSCI 572 University of Southern California

  2. REQUIREMENT FOR SEARCH IN SOFTWARE LIBRARY • SEARCH FOR FUNCTIONALLY SIMILAR COMPONENTS • E.g. Yahoo Search API and Google Search API for query “I want to search pages” Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  3. A.I. OR Knowledge Base Approach I.R. OR Free Text Based Approach ENTER DOMAIN KNOWLEDGE NO PRIOR KNOWLEDGE REQUIRED MANUAL OR SEMI-AUTOMATIC COMPLETELY AUTOMATIC SPECIFIC AND DIFFICULT TO SCALE TO NEW DOMAIN GENERIC AND VERRY EASY TO SCALE TO NEW DOMAIN SEMANTIC UNDERSTANDING OF DOCUMENTS NO SEMANTIC UNDERSTANDING OF DOCUMENTS Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  4. SINGLE KEYWORD LEXICAL RELATION CONTEXT INFORMATION IS LOST E.g. Apple Fruit VS Apple Computers REVEALS CONTEXT INFORMATION VS HIGH FREQUENCY OF LEXICAL TERM PROVIDES HIGH FUNCTIONAL INFORMATION OF DOCUMENT E.g. Word “Copy File” in UNIX HIGH FREQUENCY GENERIC TERMS MIGHT INTRODUCE NOISE . E.g. Word “File” in UNIX manual does not characterize the functionality of any command Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  5. CLUSTERING IR USING HAC(Hierarchical Agglomerative Clustering) LINEAR IR USING INVERTED INDEX Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  6. LEXICAL RELATIONS • TWO WORDS IN A SENTENCE HAVING SYNTACTIC RELATIONSHIP BETWEEN THEM : Subject-Verb, Verb-Direct object , Verb-Indirect object, etc • OPEN CLASS WORD – NOUNS,ADJECTIVE,ADVERBS ARE MEANING BEARING . • CLOSED CLASS WORD – Conjunctions(and, or), Articles (the, a), Demonstratives (this, that), and Prepositions (to, from, at, with). Does not convey any Meaning to sentence Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  7. EXTRACT [1] LEXICAL RELATIONS ALGO.[2] W1 W2 W3 W4 W5 5 – Word Window Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  8. EXTRACT [1] LEXICAL RELATIONS ALGO. [2] Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  9. EXTRACT [1] LEXICAL RELATIONS ALGO. [2] Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  10. OUTPUT FROM EXTRACT [1] ALGORITHM. [0] RESOLVING POWER Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  11. CREATE INVERTED INDEX . [2] SELECT TOP N INFORMATIVE (RESOLVING POWER)LEXICAL RELATION FOR EACH DOCUMENT FORMING PROFILE FOR THE DOCUMENT . Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  12. SIMILARITY MEASURE BETWEEN TWO DOCUMENTS [2] dx ∂(dx,dy) dy • LET X = set of top N resolving power lexical relations for document dx • Y = set of top N resolving power lexical relations for document dy • (X ∩ Y) = Set of Lexical Relations Common Between dx and dy ∂(dx,dy)= ∑Vi€(X ∩ Y) (Pi(dx)*Pi(dy)) Where: Pi(dx) and Pi(dy) =Resolving Power Of lexical Relation - iw.r.t. document dx and dy respectively Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  13. CLUSTER SIMILAR FUNCTIONAL COMPONENTS USING HIERARCHICAL AGGLOMERATIVE CLUSTERING[2] ∂({d3,d4},{d5}) ∂({d3},{d4}) ∂({d1},{d2}) {d5} {d4} {d1} {d2} {d3} Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  14. INFORMATION RETRIEVAL[2] USER SPECIFY FREE TEXT QUERY SEARCH AND RETURN RESULTS - LINEAR I.R. USING INVERTED INDEX USER SATISFIED ?? NO ALLOW USER TO TRAVERSE THROUGH CLUSTERED HIERARCHY Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  15. LINEAR INFORMATION RETRIEVAL[2] d1 ∂(dq,d1) d2 ∂(dq,d2) dq ∂(dq,dn) dn Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  16. GURU : WORKING SYSTEM SNAPSHOT [2] Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  17. EVALUATION[2] MAINTENANCE COST : INCREMENTAL INSERTION [3] OF NEW COMPONENTS IS EASY EFFICIENCY: 2.5 secs on RT ;0.15 secs on IBM RISC for query containing 5 to 15 Lexical Relation RETRIEVAL EFFECTIVENESS : Contd… Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  18. EVALUATION Precision-Recall Curve[ 2] If c = Total number of records retrieved after executing query q R= Total Number of expected correct result - Determined before query is executed. r = Total number of correct result retrieved after executing query q. Then Recall = r/R Prescision= r/c Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  19. PROS: • VERY SIMPLE AND ELEGANT APPROACH • EASY TO EXTEND TO ANY DOMAIN i.e. GENERIC APPROACH • PAPER ADEQUATELY PROVIDED BACKGROUND BY DESCRIBING PAST RESEARCH Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  20. CONS: • May fail in following case • E.g. ‘xcalc’ and ‘bc’ Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  21. FURTHER RESEARCH: • COMBINE KNOWLEDGE BASE APPROACH WITH THIS TECHNIQUE • e.g. Knowledge bc=calculator can be added to GURU to increase recall. • IMPROVED ALGORITHMS FOR INCREMENTAL UPDATION OF INDICES . Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  22. References • 0 - Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries by Yoelle S. Maarek, Frank A Smadja • 1 - F. De Saussure, Cours de Linguistique Geaerale, Qualridme edition. Librairie Payot, Paris, France, 1949. • 2 – GURU-Information Retrieval For Reuse - Y S. Maarek,Deniel M Berry,Gail E . Kaiser. • 3 - Kaplan and Maarek, 1990: Incremental maintenance of semantic links in dynamically changing hypertext systems .Interacting with Computers Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

  23. Q & A Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

More Related