210 likes | 365 Views
Source for Information Gain Formula. Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig Chapter 18: Learning from Observations . Similarity in CBR (Cont’d). Sources: Chapter 4 www.iiia.csic.es/People/enric/AICom.html www.ai-cbr.org. Other Similarity Metrics.
E N D
Source for Information Gain Formula Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig Chapter 18: Learning from Observations
Similarity in CBR (Cont’d) Sources: Chapter 4 www.iiia.csic.es/People/enric/AICom.html www.ai-cbr.org
Other Similarity Metrics • Suppose that we have cases represented as attribute-value pairs (e.g., the restaurant domain) • Suppose initially that the values are binary • We want to define similarity between two cases of the form: • X = (X1, …, Xn) where Xi = 0 or 1 • Y = (Y1, …,Yn) where Yi = 0 or 1
Preliminaries • Let: • A = (i=1,n)Xi•Yi • B = (i=1,n)Xi•(1-Yi) • C = (i=1,n)(1-Xi)•Yi • D = (i=1,n)(1-Xi) •(1-Yi) • Then, A + B + C + D = (number of attributes for which Xi =1 and Yi = 1) (number of attributes for which Xi =1 and Yi = 0) (number of attributes for which Xi =0 and Yi = 1) (number of attributes for which Xi =0 and Yi = 0) n “matching attributes” “mismatching attributes” A+D = B+C=
Hamming Distance H(X,Y) = n –(i=1,n)Xi•Yi–(i=1,n)(1-Xi)•(1-Yi) • Properties: • Range of H: • H counts the mismatch between the attribute values • H is a distance metric: • H((1-X1, …, 1-Xn), (1-Y1, …,1-Yn)) = [0,n] • H(X,X) = 0 • H(X,Y) = H(Y,X) H((X1, …, Xn), (Y1, …,Yn))
Proportion of the difference # of mismatches Simple-Matching-Coefficient (SMC) n – (A + D) = B + C • H(X,Y) = • Another distance-similarity compatible function is • f(x) = 1 – x/max (where max is the maximum value for x) • We can define the SMC similarity, simH: simH(X,Y) = 1 – ((n – (A+D))/n) = (A+D)/n = 1- ((B+C)/n) Homework (I): Show that f(x) is order inverting: if x < y then f(x) > f(y)
Simple-Matching-Coefficient (SMC) (II) • If we use on simH(X,Y) = (A+D)/n =1- ((B+C)/n) = factor(A, B, C, D) • Monotonic: • If A A’ then: • If B B’ then: • If C C’ then: • If D D’ then: factor(A,B,C,D) factor(A’,B,C,D) factor(A,B’,C,D) factor(A,B,C,D) factor(A,B,C’,D) factor(A,B,C,D) factor(A,B,C,D) factor(A,B,C,D’) • Symmetric: • simH (X,Y) = simH(Y,X)
Variations of the SMC • The hamming similarity assign equal value to matches (both 0 or both 1) • There are situations in which you want to count different when both match with 1 as when both match with 0 • Thus, sim((1-X1, …, 1-Xn), (1-Y1, …,1-Yn)) = sim((X1, …, Xn), (Y1, …,Yn)) may not hold • Example: Two symptoms of patients are similar if they both have fever (Xi = 1 and Yi = 1) but not similar if neither have fever (Xi = 0 and Yi = 0) • Specific attributes may be more important than other attributes Example: manufacturing domain: some parts of the workpiece are more important than others
Variations of SMC (III) • simH(X,Y) = (A+D)/n = (A+D)/(A+B+C+D) • We introduce a weight, , with 0 < < 1: sim(X,Y) = ((A+D))/ ((A+D) + (1 - )(B+C)) • For which is sim(X,Y) = simH(X,Y)? = 0.5 • sim(X,Y) preserves the monotonic and symmetric conditions Homework(II): Show that sim(X,Y) is monotonic
1 > 0.5 = 0.5 < 0.5 0 n 0 The similarity depends only from A, B, C and D (3) • What is the role of ? What happens if > 0.5? If < 0.5? sim(X,Y) = ((A+D))/ ((A+D) + (1 - )(B+C)) • If > 0.5 we give more weights to the matching attributes than to the miss-matching • If < 0.5 we give more weights to the miss-matching attributes than to the matching
Discarding 0-match • Thus, sim((1-X1, …, 1-Xn), (1-Y1, …,1-Yn)) = sim((X1, …, Xn), (Y1, …,Yn)) may not hold • Only when the attribute occurs (i.e., Xi = 1 and Yi = 1 ) will contribute to the similarity • Possible definition of the similarity: sim = A / (A+ B+C)
Specific Attributes may be More Important Than Other Attributes • Significance of the attributes varies • Weighted Hamming distance: • There is a weight vector: (1, …, n) such that • (i=1,n) i = 1 HW(X,Y) = 1 –(i=1,n) i • Xi•Yi–(i=1,n) i • (1-Xi)•(1-Yi) • Example: “Process planning: some features are more important than others”
Homework (Part III): Attributes May Have multiple Values • X = (X1, …, Xn) where Xi Ti • Y = (Y1, …,Yn) where Yi Ti • Each Ti is finite • Define a formula for the Hamming distance in this context
Non Monotonic Similarity • The monotony condition in similarity, formally, says that: sim(A,B) sim(A’,B) • always holds for any A and A’ such that A A’ • Informally the monotony condition can be expressed as: • For any X, Y, X’ attribute-value vectors, If we obtain X’ by modifying X on the value of one attribute such that X’ and Y have the same value on that attribute then: sim(X,Y) sim(X’,Y)
Non Monotonic Similarity (2) • Is the hamming distance monotonic? Yes simH(X,Y) = (i=1,n)eq(Xi,Yi) / n • Consider the XOR function: • (0,0) and (1,1) are on the same class (+) • (0,1) and (1,0) are on the same class (-) • Thus d((1,1),(1,0)) > d((1,1),(0,0)) • Is this monotonic? No
Suppose that we have two interconnected batteries B and B’ and 3 lamps X, Y and Z that have the following properties: • If X is on, B and B’ work • If Y is on, B or B’ work • If Z is on, B works Situation X Y Z B B’ • 0 1 1 Ok Fail • 0 1 0 Fail Ok • 0 0 0 Fail Fail Non Monotonic Similarity (3) • You may think: “well that was mathematics, how about real world?” • Thus: • sim(1,3) > sim(1,2) • Non monotonic!
P S A B C Tversky Contrast Model • Defines a non monotonic distance • Comparison of a situation S with a prototype P (i.e, a case) • S and P are sets of features • The following sets: • A = S P • B = P – S • C = S – P
Tversky Contrast Model (2) • Tversky-distance: • Where f: [0, ) • f, , , and are fixed and defined by the user • Example: • If f(A) = # elements in A • = = = 1 • T counts the number of elements in common minus the differences • The Tversky-distance is not symmetric T(P,S) = f(A) - f(B) - f(C)
Local versus Global Similarity Metrics • In many situations we have similarity metrics between attributes of the same type (called local similarity metrics). Example: For a complex engine, we may have a similarity for the temperature of the engine • In such situations a reasonable approach to define a global similarity sim(x,y) is to “aggregate” the local similarity metrics simi(xi,yi). A widely used practice • What requirements should we give to sim(x,y) in terms of the use of simi(xi,yi)? sim(x,y) to increate monotonically with each simi(xi,yi).
Local versus Global Similarity Metrics (Formal Definitions) • A local similarity metric on an attribute Ti is a similarity metric simi: Ti Ti [0,1] • A function : [0,1]n [0,1] is an aggregation function if: • (0,0,…,0) = 0 • is monotonic non-decreasing on every argument • Given a collection of n similarity metrics sim1, …, simn, for attributes taken values from Ti, a global similarity metric, is a similarity metric sim:V V [0,1], V in T1 … Tn, such that there is an aggregation function with: • sim(X,Y) = sim(X,Y) = (sim1(X1,Y1), …,simn(Xn,Yn)) Example: (X1,X2,…,Xn) = (X1+X2+…+Xn)/n