1 / 16

Association rule discovery ( Market basket-analysis)

Association rule discovery ( Market basket-analysis). MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining . http://www-users.cs.umn.edu/~kumar/dmbook/. What is Association Mining?.

lieu
Download Presentation

Association rule discovery ( Market basket-analysis)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Association rule discovery(Market basket-analysis) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.http://www-users.cs.umn.edu/~kumar/dmbook/

  2. What is Association Mining? • Discovering interesting relationships between variables in large databases (http://en.wikipedia.org/wiki/Association_rule_learning) • Find out which items predict the occurrence of other items • Also known as “affinity analysis” or “market basket” analysis

  3. Examples of Association Mining • Market basket analysis/affinity analysis • What products are bought together? • Where to place items on grocery store shelves? • Amazon’s recommendation engine • “People who bought this product also bought…” • Telephone calling patterns • Who do a set of people tend to call most often? • Social network analysis • Determine who you “may know”

  4. Market-Basket Analysis • Large set of items • e.g., things sold in a supermarket • Large set of baskets • e.g., things one customer buys in one visit • Supermarket chains keep terabytes of this data • Informs store layout • Suggests “tie-ins” • Place spaghetti sauce in the pasta aisle • Put diapers on sale and raise the price of beer

  5. Market-Basket Transactions Association Rules from these transactions X  Y (antecedent  consequent) {Diapers}  {Beer},{Milk, Bread}  {Diapers} {Beer, Bread}  {Milk},{Bread}  {Milk, Diapers}

  6. Core idea: The itemset • Itemset: A group of items of interest{Milk, Beer, Diapers} • This itemset is a “3 itemset” because itcontains…3 items! • An association rule expresses related itemsets • X  Y, where X and Y are two itemsets • {Milk, Diapers}  {Beer} means“when you have milk and diapers, you also have beer)

  7. Support • Support count () • Frequency of occurrence of an itemset • {Milk, Beer, Diapers} = 2 (i.e., it’s in baskets 4 and 5) • Support (s) • Fraction of transactions that contain allitemsets in the relationship X  Y • s({Milk, Diapers, Beer}) = 2/5 = 0.4 • You can calculate support for both X and Y separately • Support for X = 3/5 = 0.6; Support for Y = 3/5 = 0.6 X Y

  8. Confidence • Confidence is the strength of the association • Measures how often items in Y appear in transactions that contain X c must be between 0 and 11 is a complete association 0 is no association This says 67% of the times when you have milk and diapers in the itemset you also have beer!

  9. Some sample rules All the above rules are binary partitions of the same itemset: {Milk, Diapers, Beer}

  10. But don’t blindly follow the numbers • Rules originating from the same itemset (XY) will have identical support • Since the total elements (XY) are always the same • But they can have different confidence • Depending on what is contained in X and Y • High confidence suggests a strong association • But this can be deceptive • Consider {Bread}{Diapers} • Support for the total itemset is 0.6 (3/5) • And confidence is 0.75 (3/4) – pretty high • But is this just because both are frequently occurring items (s=0.8)? • You’d almost expect them to show up in the same baskets by chance

  11. Lift • Takes into account how co-occurrence differs from what is expected by chance • i.e., if items were selected independently from one another • Based on the support metric Support for total itemset X and Y Support for X times support for Y

  12. Lift Example • What’s the lift of the association rule{Milk, Diapers}  {Beer} • So X = {Milk, Diapers} and Y = {Beer}s({Milk, Diapers, Beer}) = 2/5 = 0.4s({Milk, Diapers}) = 3/5 = 0.6s({Beer}) = 3/5 = 0.6So When Lift > 1, the occurrence of X Y together is more likely than what you would expect by chance

  13. Another example Are people more likely to have a checking account if they have a savings account? Support ({Savings} {Checking}) = 5000/10000 = 0.5 Support ({Savings}) = 6000/10000 = 0.6 Support ({Checking}) = 8500/10000 = 0.85 Confidence ({Savings} {Checking}) = 5000/6000 = 0.83 Answer: No In fact, it’s slightly less than what you’d expect by chance!

  14. Selecting the rules • We know how to calculate the measures for each rule • Support • Confidence • Lift • Then we set up thresholds for the minimum rule strength we want to accept • For support – called minsup • For confidence – called minconf

  15. But this can be overwhelming • Imagine all the combinations possiblein your local grocery store • Every product matched with every combination of other products • Tens of thousands of possible rule combinations So where do you start?

  16. Once you are confident in a rule, take action {Milk, Diapers}  {Beer}

More Related