1 / 13

Semantic Sentence Similarity: A Novel Approach

This report introduces a method for measuring sentence similarity using semantic nets and corpus statistics. By incorporating lexical databases and corpus statistics, the proposed methodology provides adaptable and efficient similarity measures consistent with human knowledge. The approach combines semantic and word order information to model human common sense knowledge and is evaluated using similarity correlation. While the method offers advantages in categorical data similarity measures and knowledge retrieval, it currently lacks word sense disambiguation for polysemous words.

toddortiz
Download Presentation

Semantic Sentence Similarity: A Novel Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sentence Similarity Based on Semantic Nets and Corpus Statistics Author : Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett Reporter : Tze Ho-Lin 2006/1/3 TKDE, 2006

  2. Outline • Motivation • Objectives • Methodology • Evaluation • Conclusion • Personal Comments

  3. Motivation • Existing methods for computing sentence similarity have been adopted from approaches used for long text documents. • These methods process sentences in a very high-dimensional space and are consequently • Inefficient • Require human input • Not adaptable to some application domains

  4. Objectives • Using lexical database to enable our method to model human common sense knowledge. • Incorporation of corpus statistics allows our method to be adaptable to different domains.

  5. Methodology (WordNet) (Brown Corpus)

  6. Evaluation Similarity correlation (Pearson)

  7. Conclusion • A method for measuring the semantic similarity based on semantic and word order information. • The proposed method provides similarity measures that are fairly consistent with human knowledge.

  8. Personal Comments • Application • Categorical data similarity measure. • Knowledge Retrieval • Advantage • This function has a simple, easy-to-implement formula. • Disadvantage • This technique compares words on a word-by-word basis. • The proposed method does not currently conduct word sense disambiguation for polysemous words.

  9. Method: Semantic Similarity between Words

  10. Method: Semantic Similarity between Sentences One element Is a word in the joint word set. Is its associated word in the sentence. n is the frequency of the word w in the corpus N is the total number of words in the corpus

  11. Method: Word Order Similarity between Sentences T1: RAM keeps things being worked with. T2: The CPU uses RAM as a short-term memory store. T={RAM keeps things being worked with The CPU uses as a short-term memory store}.

  12. Process for Deriving the Semantic Vector (preset threshold: 0.2) T1: RAM keeps things being worked with. T2: The CPU uses RAM as a short-term memory store. T={RAM keeps things being worked with The CPU uses as a short-term memory store}.

  13. Overall sentence similarity

More Related