1 / 20

Source for Information Gain Formula

Source for Information Gain Formula. Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig Chapter 18: Learning from Observations . Similarity in CBR (Cont’d). Sources: Chapter 4 www.iiia.csic.es/People/enric/AICom.html www.ai-cbr.org. Other Similarity Metrics.

darius
Download Presentation

Source for Information Gain Formula

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Source for Information Gain Formula Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig Chapter 18: Learning from Observations

  2. Similarity in CBR (Cont’d) Sources: Chapter 4 www.iiia.csic.es/People/enric/AICom.html www.ai-cbr.org

  3. Other Similarity Metrics • Suppose that we have cases represented as attribute-value pairs (e.g., the restaurant domain) • Suppose initially that the values are binary • We want to define similarity between two cases of the form: • X = (X1, …, Xn) where Xi = 0 or 1 • Y = (Y1, …,Yn) where Yi = 0 or 1

  4. Preliminaries • Let: • A = (i=1,n)Xi•Yi • B = (i=1,n)Xi•(1-Yi) • C = (i=1,n)(1-Xi)•Yi • D = (i=1,n)(1-Xi) •(1-Yi) • Then, A + B + C + D = (number of attributes for which Xi =1 and Yi = 1) (number of attributes for which Xi =1 and Yi = 0) (number of attributes for which Xi =0 and Yi = 1) (number of attributes for which Xi =0 and Yi = 0) n “matching attributes” “mismatching attributes” A+D = B+C=

  5. Hamming Distance H(X,Y) = n –(i=1,n)Xi•Yi–(i=1,n)(1-Xi)•(1-Yi) • Properties: • Range of H: • H counts the mismatch between the attribute values • H is a distance metric: • H((1-X1, …, 1-Xn), (1-Y1, …,1-Yn)) = [0,n] • H(X,X) = 0 • H(X,Y) = H(Y,X) H((X1, …, Xn), (Y1, …,Yn))

  6. Proportion of the difference # of mismatches Simple-Matching-Coefficient (SMC) n – (A + D) = B + C • H(X,Y) = • Another distance-similarity compatible function is • f(x) = 1 – x/max (where max is the maximum value for x) • We can define the SMC similarity, simH: simH(X,Y) = 1 – ((n – (A+D))/n) = (A+D)/n = 1- ((B+C)/n) Homework (I): Show that f(x) is order inverting: if x < y then f(x) > f(y)

  7. Simple-Matching-Coefficient (SMC) (II) • If we use on simH(X,Y) = (A+D)/n =1- ((B+C)/n) = factor(A, B, C, D) • Monotonic: • If A  A’ then: • If B  B’ then: • If C  C’ then: • If D  D’ then: factor(A,B,C,D)  factor(A’,B,C,D) factor(A,B’,C,D)  factor(A,B,C,D) factor(A,B,C’,D)  factor(A,B,C,D) factor(A,B,C,D)  factor(A,B,C,D’) • Symmetric: • simH (X,Y) = simH(Y,X)

  8. Variations of the SMC • The hamming similarity assign equal value to matches (both 0 or both 1) • There are situations in which you want to count different when both match with 1 as when both match with 0 • Thus, sim((1-X1, …, 1-Xn), (1-Y1, …,1-Yn)) = sim((X1, …, Xn), (Y1, …,Yn)) may not hold • Example: Two symptoms of patients are similar if they both have fever (Xi = 1 and Yi = 1) but not similar if neither have fever (Xi = 0 and Yi = 0) • Specific attributes may be more important than other attributes Example: manufacturing domain: some parts of the workpiece are more important than others

  9. Variations of SMC (III) • simH(X,Y) = (A+D)/n = (A+D)/(A+B+C+D) • We introduce a weight, , with 0 <  < 1: sim(X,Y) = ((A+D))/ ((A+D) + (1 - )(B+C)) • For which  is sim(X,Y) = simH(X,Y)?  = 0.5 • sim(X,Y) preserves the monotonic and symmetric conditions Homework(II): Show that sim(X,Y) is monotonic

  10. 1  > 0.5  = 0.5  < 0.5 0 n 0 The similarity depends only from A, B, C and D (3) • What is the role of ? What happens if  > 0.5? If  < 0.5? sim(X,Y) = ((A+D))/ ((A+D) + (1 - )(B+C)) • If  > 0.5 we give more weights to the matching attributes than to the miss-matching • If  < 0.5 we give more weights to the miss-matching attributes than to the matching

  11. Discarding 0-match • Thus, sim((1-X1, …, 1-Xn), (1-Y1, …,1-Yn)) = sim((X1, …, Xn), (Y1, …,Yn)) may not hold • Only when the attribute occurs (i.e., Xi = 1 and Yi = 1 ) will contribute to the similarity • Possible definition of the similarity: sim = A / (A+ B+C)

  12. Specific Attributes may be More Important Than Other Attributes • Significance of the attributes varies • Weighted Hamming distance: • There is a weight vector: (1, …, n) such that • (i=1,n) i = 1 HW(X,Y) = 1 –(i=1,n) i • Xi•Yi–(i=1,n) i • (1-Xi)•(1-Yi) • Example: “Process planning: some features are more important than others”

  13. Homework (Part III): Attributes May Have multiple Values • X = (X1, …, Xn) where Xi Ti • Y = (Y1, …,Yn) where Yi Ti • Each Ti is finite • Define a formula for the Hamming distance in this context

  14. Non Monotonic Similarity • The monotony condition in similarity, formally, says that: sim(A,B)  sim(A’,B) • always holds for any A and A’ such that A  A’ • Informally the monotony condition can be expressed as: • For any X, Y, X’ attribute-value vectors, If we obtain X’ by modifying X on the value of one attribute such that X’ and Y have the same value on that attribute then:  sim(X,Y) sim(X’,Y)

  15. Non Monotonic Similarity (2) • Is the hamming distance monotonic? Yes simH(X,Y) = (i=1,n)eq(Xi,Yi) / n • Consider the XOR function: • (0,0) and (1,1) are on the same class (+) • (0,1) and (1,0) are on the same class (-) • Thus d((1,1),(1,0)) > d((1,1),(0,0)) • Is this monotonic? No

  16. Suppose that we have two interconnected batteries B and B’ and 3 lamps X, Y and Z that have the following properties: • If X is on, B and B’ work • If Y is on, B or B’ work • If Z is on, B works Situation X Y Z B B’ • 0 1 1 Ok Fail • 0 1 0 Fail Ok • 0 0 0 Fail Fail Non Monotonic Similarity (3) • You may think: “well that was mathematics, how about real world?” • Thus: • sim(1,3) > sim(1,2) • Non monotonic!

  17. P S A B C Tversky Contrast Model • Defines a non monotonic distance • Comparison of a situation S with a prototype P (i.e, a case) • S and P are sets of features • The following sets: • A = S  P • B = P – S • C = S – P

  18. Tversky Contrast Model (2) • Tversky-distance: • Where f:  [0, ) • f, , , and  are fixed and defined by the user • Example: • If f(A) = # elements in A •  =  =  = 1 • T counts the number of elements in common minus the differences • The Tversky-distance is not symmetric T(P,S) = f(A) - f(B) - f(C)

  19. Local versus Global Similarity Metrics • In many situations we have similarity metrics between attributes of the same type (called local similarity metrics). Example: For a complex engine, we may have a similarity for the temperature of the engine • In such situations a reasonable approach to define a global similarity sim(x,y) is to “aggregate” the local similarity metrics simi(xi,yi). A widely used practice • What requirements should we give to sim(x,y) in terms of the use of simi(xi,yi)? sim(x,y) to increate monotonically with each simi(xi,yi).

  20. Local versus Global Similarity Metrics (Formal Definitions) • A local similarity metric on an attribute Ti is a similarity metric simi: Ti Ti [0,1] • A function : [0,1]n [0,1] is an aggregation function if: • (0,0,…,0) = 0 •  is monotonic non-decreasing on every argument • Given a collection of n similarity metrics sim1, …, simn, for attributes taken values from Ti, a global similarity metric, is a similarity metric sim:V  V  [0,1], V in T1 … Tn, such that there is an aggregation  function with: • sim(X,Y) = sim(X,Y) = (sim1(X1,Y1), …,simn(Xn,Yn)) Example: (X1,X2,…,Xn) = (X1+X2+…+Xn)/n

More Related