Similarity in CBR (Cont’d)

Similarity in CBR (Cont’d) Sources: Chapter 4 www.iiia.csic.es/People/enric/AICom.html www.ai-cbr.org

Proportion of the difference Simple-Matching-Coefficient (SMC) n – (A + D) = B + C • H(X,Y) = • Another distance-similarity compatible function is • f(x) = 1 – x/max (where max is the maximum value for x) • We can define the SMC similarity, simH: simH(X,Y) = 1 – ((n – (A+D))/n) = (A+D)/n = 1- ((B+C)/n) Solution (I): Show that f(x) is order inverting: if x < y then f(x) > f(y)

Simple-Matching-Coefficient (SMC) (II) • If we use on simH(X,Y) = 1- ((B+C)/n) = factor(A, B, C, D) • Monotonic: • If A  A’ then: • If B  B’ then: • If C  C’ then: • If D  D’ then: factor(A,B,C,D)  factor(A’,B,C,D) factor(A,B’,C,D)  factor(A,B,C,D) factor(A,B,C’,D)  factor(A,B,C,D) factor(A,B,C,D)  factor(A,B,C,D’) • Symmetric: • simH (X,Y) = simH(Y,X) Solution(II): Show that simH(X,Y) is monotonic

Variations of SMC (III) • simH(X,Y) = (A+D)/n = (A+D)/(A+B+C+D) • We introduce a weight, , with 0 <  < 1: sim(X,Y) = ((A+D))/ ((A+D) + (1 - )(B+C)) • For which  is sim(X,Y) = simH(X,Y)?  = 0.5 • sim(X,Y) preserves the monotonic and symmetric conditions Solution(III): Show that sim(X,Y) is monotonic

Homework (Part IV): Attributes May Have multiple Values • X = (X1, …, Xn) where Xi Ti • Y = (Y1, …,Yn) where Yi Ti • Each Ti is finite • Define a formula for the Hamming distance in this context

P S A B C Tversky Contrast Model • Defines a non monotonic distance • Comparison of a situation S with a prototype P (i.e, a case) • S and P are sets of features • The following sets: • A = S  P • B = P – S • C = S – P

Tversky Contrast Model (2) • Tversky-distance: • Where f: Sets  [0, ), , , and  are constants • f, , , and  are fixed and defined by the user • Example: • If f(A) = # elements in A •  =  =  = 1 • T counts the number of elements in common minus the differences • The Tversky-distance is not symmetric T(P,S) = f(A) - f(B) - f(C)

Local versus Global Similarity Metrics • In many situations we have similarity metrics between attributes of the same type (called local similarity metrics). Example: For a complex engine, we may have a similarity for the temperature of the engine • In such situations a reasonable approach to define a global similarity sim(x,y) is to “aggregate” the local similarity metrics simi(xi,yi). A widely used practice • What requirements should we give to sim(x,y) in terms of the use of simi(xi,yi)? sim(x,y) to increate monotonically with each simi(xi,yi).

Local versus Global Similarity Metrics (Formal Definitions) • A local similarity metric on an attribute Ti is a similarity metric simi: Ti Ti [0,1] • A function : [0,1]n [0,1] is an aggregation function if: • (0,0,…,0) = 0 •  is monotonic non-decreasing on every argument • Given a collection of n similarity metrics sim1, …, simn, for attributes taken values from Ti, a global similarity metric, is a similarity metric sim:V  V  [0,1], V in T1 … Tn, such that there is an aggregation  function with: • sim(X,Y) = sim(X,Y) = (sim1(X1,Y1), …,simn(Xn,Yn)) Homework: provide an example of an aggregation function and a non-aggregation function and prove it. Show a global sim. metric

Solution • Suppose that cases use an object oriented representation: • Suppose that cases use a taxonomical representation, describe how you would measure similarity and give a concrete example illustrating the process you described to measure similarity • Suppose that cases use a compositional representation, describe how you would measure similarity and give a concrete example illustrating the process you described to measure similarity Suggestion: look at the book!

Frontiers of Knowledge • Dealing with numerical and non numerical values • Aggregation of local similarity metrics into a global similarity metric helps • but sometimes we don’t have local similarity metrics

Homework (II) • From Chapter 5, what is the difference between completion and adaptation functions? What si their role on adaptation? Provide an example • Show that Graph coloring is NP-complete • Assume that Constraint-SAT is NP complete • Definition. A constraint is a formula of the form: • (x = y) • (x  y) Where x and y are variables that can take values from a set (e.g., {yellow, white, black, red, …}) • Definition. Constraint-SAT: given a conjunction of constraints, is there an instantiation of the variables that makes the conjunction true?

Similarity in CBR (Cont’d)

Similarity in CBR (Cont’d)

Presentation Transcript

Introduction to The On-line Library Catalog

QSAR Application Toolbox: Third Step - Data Gap Filling (Read-Across by Molecular Similarity)

I. Research Fundamentals

SIDEROSIS

SimRank : A Measure of Structural-Context Similarity

Word Meaning and Similarity

Bud, Not Buddy By: Christopher Paul Curtis Lesson 2 cont. – Lesson 3

Sequence Comparison

Topic 1 Outline

Accomplish the gene regulation of prokaryotes, we comeback to the eukaryotes.

Bioinformatics Workshop 1 Sequences and Similarity Searches

Models of Learning

Lecture 19 Multimedia Networking ( cont )

PSY 369: Psycholinguistics

Ratios, Proportions, and Similarity

Lesson 2

Network Traffic Self-Similarity

Learning Embeddings for Similarity-Based Retrieval

Faculty

IgG4-Related Diseases: Expert Perspectives on Key Developments