1 / 14

Interestingness

Interestingness. Interestingness Measures - Lift. Measure of dependent/correlated events: lift Lift(B, C) = c(B->C)/s(C) = s(B u C)/(s(B) x s(C)) Lift(B, C) may tell how B and C are correlated Lift(B, C) = 1 => B and C are independent > 1: positively correlated < 1: negatively correlated

Download Presentation

Interestingness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interestingness

  2. Interestingness Measures - Lift • Measure of dependent/correlated events: lift • Lift(B, C) = c(B->C)/s(C) = s(B u C)/(s(B) x s(C)) • Lift(B, C) may tell how B and C are correlated • Lift(B, C) = 1 => B and C are independent • > 1: positively correlated • < 1: negatively correlated • Lift is more telling than support (s) & confidence (c)

  3. Lift Example

  4. Lift Solution • Lift(B, C) = (400/1000)/((600/1000)*(750/1000)) = 0.89 • Lift(B, ^C) = (200/1000)/((600/1000)*(250/1000)) = 1.33 • Thus B & C are negatively correlated since Lift(B,C) < 1 • B and ^C are positively correlated since Lift(B, ^C) > 1

  5. Lift Calculations • s(B u C) =400/1000 = 2/5 = .4 • s(B) = 600/1000 = 3/5 = .6 • s(C) = 750/1000 = ¾ = .75 • Lift(B, C) = .4/(.6*.75) = .4/.45 = .89 • s(B u ^C) = .2 • s(B) = .6 • s(^C) = .25 • Lift(B, ^C) = .2/.15 = 1.33 • Lift(^B, C) = ? • Lift(^B, ^C) = ?

  6. Interestingness Measures - c2 • Another measure to test correlated events: c2 • c2 = Σ (Observed – Expected)2 / Expected • General rules • c2 = 0 => independent • c2 > 0 => correlated, either positively or negatively, so it needs additional tests. • C2 also tells better than support (s) and confidence (c)

  7. c2 Example

  8. c2 Solution • Now c2 = (400-450) 2/450 + (350-300) 2/300 + (200-150) 2/ 150 + (50-100) 2/100 = 55.55 • c2 Shows B & C are correlated because the answer > 0 • As expected value is 450 but 400 is observed we can say that B & C are negatively correlated.

  9. Are Lift and c2 Always Good? • Null transactions -> transactions that contain neither B nor C • Let’s examine another dataset D • BC (100) is much rarer than B^C(1000) and ^BC (1000), but there are many ^B^C (100000) • So unlikely that B&C will happen together! • But, Lift(B,C) = 8.44 >> 1 (strong positive correlation) • c2 i= 670 : Observed (BC) >> expected value (11.85) • Too many null transactions may “spoil the soup”!

  10. c2 & Lift With Null Example

  11. Other Interestingness Algorithms • Null invariance – value does not change with the number of null transactions. • Interestingness null invariance measures: • AllConf(A,B) • Jaccard(A,B) • Cosine(A,B) • Kulczynski(A,B) • MaxConf(A,B) • Not all null-invariant measures are created equal

  12. Imbalance Ratio with Kulczynski • Imbalance Ratio: measure the imbalance of two itemsets A&B in rule implications

  13. Kulczynski • (P(B/C) + P(C/B))/2 < epsilon • where epsilon is 0.01 • Where A = milk, b = coffee • 1 billion transaction = 1,000,000,000 • A = 1 million time = 1,000,000 • B = 10 thousand times = 10000 • A + B = one hundred = 100 • S(A) = 10^6 / 10^9 = 10^-3 = 1/1000 • S(B) = 10^4/ 10^9 = 10^-5 = 1/100000 • S(A u B) = 10^2 / 10^9 = 10^-7 = 1/10000000 • S(A) * S(B) = 10^-3*10^-5 = 10^-8

  14. Kulczynski • P(B|A) = P(AUB) / P(A) = 10^2/10^6 = 10^-4 • P(A|B) = P(AUB) / P(B) = 10^2/ 10^4 = 10^-2 • (P(B|A) + P(A|B))/2 = (10^-4 + 10^-2)/2 = 0.0050 < 0.01 • Therefore this is a negative pattern

More Related