130 likes | 295 Views
Similarity in CBR (Cont’d). Sources: Chapter 4 www.iiia.csic.es/People/enric/AICom.html www.ai-cbr.org. Proportion of the difference. Simple-Matching-Coefficient (SMC). n – (A + D) = B + C. H(X,Y) =. Another distance-similarity compatible function is
E N D
Similarity in CBR (Cont’d) Sources: Chapter 4 www.iiia.csic.es/People/enric/AICom.html www.ai-cbr.org
Proportion of the difference Simple-Matching-Coefficient (SMC) n – (A + D) = B + C • H(X,Y) = • Another distance-similarity compatible function is • f(x) = 1 – x/max (where max is the maximum value for x) • We can define the SMC similarity, simH: simH(X,Y) = 1 – ((n – (A+D))/n) = (A+D)/n = 1- ((B+C)/n) Solution (I): Show that f(x) is order inverting: if x < y then f(x) > f(y)
Simple-Matching-Coefficient (SMC) (II) • If we use on simH(X,Y) = 1- ((B+C)/n) = factor(A, B, C, D) • Monotonic: • If A A’ then: • If B B’ then: • If C C’ then: • If D D’ then: factor(A,B,C,D) factor(A’,B,C,D) factor(A,B’,C,D) factor(A,B,C,D) factor(A,B,C’,D) factor(A,B,C,D) factor(A,B,C,D) factor(A,B,C,D’) • Symmetric: • simH (X,Y) = simH(Y,X) Solution(II): Show that simH(X,Y) is monotonic
Variations of SMC (III) • simH(X,Y) = (A+D)/n = (A+D)/(A+B+C+D) • We introduce a weight, , with 0 < < 1: sim(X,Y) = ((A+D))/ ((A+D) + (1 - )(B+C)) • For which is sim(X,Y) = simH(X,Y)? = 0.5 • sim(X,Y) preserves the monotonic and symmetric conditions Solution(III): Show that sim(X,Y) is monotonic
Homework (Part IV): Attributes May Have multiple Values • X = (X1, …, Xn) where Xi Ti • Y = (Y1, …,Yn) where Yi Ti • Each Ti is finite • Define a formula for the Hamming distance in this context
P S A B C Tversky Contrast Model • Defines a non monotonic distance • Comparison of a situation S with a prototype P (i.e, a case) • S and P are sets of features • The following sets: • A = S P • B = P – S • C = S – P
Tversky Contrast Model (2) • Tversky-distance: • Where f: Sets [0, ), , , and are constants • f, , , and are fixed and defined by the user • Example: • If f(A) = # elements in A • = = = 1 • T counts the number of elements in common minus the differences • The Tversky-distance is not symmetric T(P,S) = f(A) - f(B) - f(C)
Local versus Global Similarity Metrics • In many situations we have similarity metrics between attributes of the same type (called local similarity metrics). Example: For a complex engine, we may have a similarity for the temperature of the engine • In such situations a reasonable approach to define a global similarity sim(x,y) is to “aggregate” the local similarity metrics simi(xi,yi). A widely used practice • What requirements should we give to sim(x,y) in terms of the use of simi(xi,yi)? sim(x,y) to increate monotonically with each simi(xi,yi).
Local versus Global Similarity Metrics (Formal Definitions) • A local similarity metric on an attribute Ti is a similarity metric simi: Ti Ti [0,1] • A function : [0,1]n [0,1] is an aggregation function if: • (0,0,…,0) = 0 • is monotonic non-decreasing on every argument • Given a collection of n similarity metrics sim1, …, simn, for attributes taken values from Ti, a global similarity metric, is a similarity metric sim:V V [0,1], V in T1 … Tn, such that there is an aggregation function with: • sim(X,Y) = sim(X,Y) = (sim1(X1,Y1), …,simn(Xn,Yn)) Homework: provide an example of an aggregation function and a non-aggregation function and prove it. Show a global sim. metric
Solution • Suppose that cases use an object oriented representation: • Suppose that cases use a taxonomical representation, describe how you would measure similarity and give a concrete example illustrating the process you described to measure similarity • Suppose that cases use a compositional representation, describe how you would measure similarity and give a concrete example illustrating the process you described to measure similarity Suggestion: look at the book!
Frontiers of Knowledge • Dealing with numerical and non numerical values • Aggregation of local similarity metrics into a global similarity metric helps • but sometimes we don’t have local similarity metrics
Homework (II) • From Chapter 5, what is the difference between completion and adaptation functions? What si their role on adaptation? Provide an example • Show that Graph coloring is NP-complete • Assume that Constraint-SAT is NP complete • Definition. A constraint is a formula of the form: • (x = y) • (x y) Where x and y are variables that can take values from a set (e.g., {yellow, white, black, red, …}) • Definition. Constraint-SAT: given a conjunction of constraints, is there an instantiation of the variables that makes the conjunction true?