1 / 18

A Taxonomy of Similarity Mechanisms for Case-Based Reasoning

A Taxonomy of Similarity Mechanisms for Case-Based Reasoning. Pa´ draig Cunningham TKDE, Vol.21, 2009, pp. 1532–1543. Presenter : Wei- Shen Tai 200 9 / 11/17. Outline. Introduction Representation Similarity measures Direct similarity mechanisms Transformation-based measures

zed
Download Presentation

A Taxonomy of Similarity Mechanisms for Case-Based Reasoning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Taxonomy of Similarity Mechanisms for Case-Based Reasoning Pa´ draigCunningham TKDE, Vol.21, 2009, pp. 1532–1543. Presenter : Wei-Shen Tai 2009/11/17

  2. Outline • Introduction • Representation • Similarity measures • Direct similarity mechanisms • Transformation-based measures • Information-theoretic measures • Emergent measures • Implications for CBR research • Conclusion • Comments

  3. Motivation • Similarity is central to CBR • More recently, a number of novel mechanisms have emerged that introduce interesting alternative perspectives on similarity.

  4. Objective • Novel SM mechanisms review • Present a taxonomy of similarity mechanisms that places these new techniques in the context of established CBR techniques.

  5. Feature value representation • In terms of case attributes or instance. • Enhancement • Discover word associations in a text corpus and then use these associations to add terms to the representation. • Bill Gates - > software, CEO, mircrosoft • Allow texts to be represented by more features.

  6. Structural representations • Hierarchical structure • Features value themselves reference nonatomic objects. • Network structure • Typically a semantic network • The Semantic Web describes the relationships between things (like tire is a part of car and John Lennon was a member of the Beatles) and the properties of things (like size, weight, age, and price) • Flow structure • Share many of the characteristics of hierarchical and network representations. For example, work or job.

  7. String and sequence representations • The most straightforward representation for free text. (non-structure data) • It supports similarity assessment is the bag-of-words strategy from information retrieval.

  8. Direct similarity mechanisms • Similarity and distance metrics • k-NN • Set-theoretic measures • Jaccard index, Dice similarity • Kullback-Leibler Divergence and the χ2 Statistic • Compare two images described as histograms. • Symbolic attributes in taxonomies • Case representation is organized by feature values into a taxonomy of is-a relationships.

  9. Transformation-based measures I • Edit Distance • the number of editing to transform one string. • From cat to rat is 1, from cats to cat is 1. • Alignment Measures for Biological Sequences • A variety of sequence alignment in biology (DNA).

  10. Transformation-based measures II • Earth mover distance • A transformation-based distance for image data.

  11. Transformation-based measures III • Similarity for networks and graphs • Structure mapping engine (SME) • Identify the appropriate mapping between the two domains.

  12. Information-theoretic measures • It works directly on the raw case representation • Compression-based similarity for text • Two very similar documents, the compressed size of both them will not be much greater than one. • Information-based similarity for biological sequences • Specialized algorithms are required to compress them • Similarity in a taxonomy • Distinguish the weight of is-a relationship between features. • A taxonomy can be quantified as the negative log likelihood. • Similarity is the common parent node with the highest value.

  13. Emergent measures I • Random forests • An ensemble of decision trees. • For each ensemble member (n > N), build a decision tree for them with less selected features (m >> M). • Track the frequency with which cases are located at the same leaf node. • Two features get more shared leaf frequency means they are more similar as well.

  14. Emergent measures II • Cluster kernels • A semi-supervized learning, where only some of the available data are labeled. • Class labels do not change in regions of high density. • Cluster kernels allow the unlabelled data to influence similarity. • where K(xi, xj)orig is a basic neighborhood kernel and K(xi, xj)bag is a kernel derived from repeated clustering of all the data.

  15. Emergent measures III • Web-based kernel • Text snippet similarity by documents returned in Web search.

  16. Implications for CBR research • Vocabulary knowledge container • In some circumstances (e.g., information-theoretic measures) the role of the similarity knowledge container is increased. • Speeding up technique • New methodologies are typically computationally intensive, the importance of strategies for speeding up case-retrieval is increased.

  17. Conclusions • Similarity measurement taxonomy • Organize the broad range of strategies for similarity assessment in CBR into a coherent taxonomy. • Improve effectiveness of CBR • Alternative metrics simply offer better accuracy because it embodies specific knowledge about the data.

  18. Comments • Advantage • This paper introduces and discusses those alternative metrics of similarity assessment for CBR. • Drawback • . • Application • Similarity measurement.

More Related