Mining Quantitative Association Rules in Large Relational Databases

Mining Quantitative Association Rules in Large Relational Databases RamakrishnanSrikant RakeshAgrawal ACM SIGMOD Conference on Management of Data, 1996 March 21, 2013 (Slides modified from SasiSekharKunta’s version.) Presented by: Sepehr Amir-Mohammadian

Outline Association Rules and Quantitative Association Rules Formal Study of Quantitative Association Analysis Partitioning Quantitative Attributes Identifying the Interesting Rules Candidate Generation Concluding Remarks Q&A

Association Rules Itemsets and , Rule Support: Confidence: Find rules that have MinSup and MinConf

Boolean Association Rules

Quantitative Association Rules

Mapping to Boolean Association Rules Use as new attribute instead of a categorical attribute Use as new attribute instead of a quantitative attribute with a small domain Use as new attribute instead of a quantitative attribute with a large domain

Problems “MinSup”: If number of partitions is large, the support of a single partition can be lower “MinConf”: Information lost during partition values into intervals. Confidence can be lower as number of intervals is smaller

Solution • Consider all combinations of adjacent values/intervals in quantitative attributes Solves “MinSup” problem • Increase the number of values/intervals, without encountering “MinSup” problem Reduces information loss • New Problems: • Execution time: Maximum support threshold, MaxSup • Many rules: Interestingness of rules

Steps of Proposed Approach Determine the number of partitions for each quantitative attribute Map values/ranges to consecutive integer values such that the order is preserved Find the support of each value of the attributes, and combine when support is less than MaxSup. Find frequent itemsets, whose support is larger than MinSup Use frequent itemsets to generate association rules Pruning out uninteresting rules

Example Step 0: Initial set of records

Example – Cont. Step 1: Determine the partitions for each quantitative attributes

Example – Cont. Step 2: Mapping intervals/values to consecutive intergers

Example – Cont. Step 2: Mapping intervals/values to consecutive integers

Example – Cont. • Step 3: Extracting large itemsets • Some of these itemsets are represented with MinSup = 0.4

Example – Cont. • Step 4: Rule generation • Some of these rules are represented with MinConf = 0.5

Formal Study of Quantitative A. A. • set of attributes • set of positive integers • , denotes that attribute has value • set of items • For any , • , set of records • , a record such that attributes are distinct • A record supports itemset if • , a quantitative association rule, where • ,

Formal Definition of Quantitative A. A. – Cont. • holds in with support , if of the records in support . • holds in with confidence , if of the records in that support , also support . • , probability that all items in are supported by a given record • is a generalization of , denoted by if

Partitioning Quantitative Attributes • A measure of partial completeness: Information lost in partitioning • : set of rules obtained before partitioning • : set of rules obtained after partitioning • Partial completeness measures the distance between a rule in and its closest generalization in • The distance is defined by the ratio of support • Give the best approach to have minimal number of partitions

Partial Completeness • : the set of frequent itemsets • For any , is -complete w.r.t if • The smaller is, the less the information lost

Example – K-Completeness Consider the following set of frequent itemsets: Then, items 2, 3, 5, 7 form a 1.5-complete set. But, items 3,5,7 do not form a 1.5-complete set.

Confidence of Rules Generated from K-Complete Set If is -complete set w.r.t , then any rule obtained from has a generalizationfrom , such that is bounded by In the previous example:

K-Completeness for a Single Attribute Consider as a quantitative attribute, partitioned into base intervals. Suppose than the support for each base interval is less than Let be the set of all combinations of base intervals that have . Then, is -complete w.r.t. the set of all ranges over .

K-Completeness for a Group of Attributes Consider a set 0f quantitative attributes, partitioned into base intervals. Suppose that the support for each base interval is less than Let be the set of all frequent itemsets over the partitioned attributes. Then, is -complete w.r.t. the set of all frequent itemsets without partitioning.

Equi-Depth Partitioning Equi-depth partitioning: Splitting the support identically Suppose that the number of intervals are given. Then, equi-depth partitioning minimizes max support for a base interval , and so minimizes . Suppose that is given and . Then, equi-depth partitioning with support in each base interval results in the minimum number of intervals:

Identify Interesting Rules • Combining intervals results in many rules • For example, suppose a quarter of people in age group 20..30 are in the age group 20..25 • with 8% sup, 70% conf • , with 2% sup, 70% conf • The second rule doesn’t give any additional information, and is less general than the first rule

Expected Value of Support and Confidence Interest: Rules with support and confidence according to some expectations Let Let , The expected value of based on , would be ) Similarly, the expected value of the confidence for the rule according to its generalization would be ) where , .

Interest Measure • Itemset is -interesting w.r.t its generalization , if • , and • For any specialization with , is -interesting w.r.t • Rule is -interesting w.r.t its generalization if • , or • Moreover, the itemset is -interesting w.r.t .

Example of Interest

Candidate Generation • Given the set of all frequent -itemsets, generate the set of • The process has three parts: • Join Phase • Subset Prune Phase • Interest Prune Phase

Join Phase • joined with itself • Example, : • Result of self-join, :

Subset Prune Phase • Make sure any -subset is in . • Example, : • Result of self-join, : • Delete the first itemset in since is not in .

Interest Prune Phase Given user-specified interest level Delete any itemset that contains an item with support greater than It is guaranteed that such itemsets cannot be -interesting w.r.t their generalizations

Concluding Remarks Introduced the problem of mining quantitative association rules Dealt with quantitative attributes by fine-partitioning the values and combining adjacent partitions as necessary Introduced partial completeness to quantify the information lost, and help decide the partitions Gave interest measure to identify interesting rules Candidate Generation

Outline Association Rules and Quantitative Association Rules Formal Study of Quantitative Association Analysis Partitioning Quantitative Attributes Identifying the Interesting Rules Extending the Apriori Algorithm Concluding Remarks Q&A

Exam Questions 1. What are the two problems with mapping quantitative associations to boolean associations? Slide No. 8 2. Give the general steps to be followed in order to mine quantitative association rules. Slide No. 10 3. If P is a K-Complete set w.r.t. the set of all frequent itemsets, the minimum confidence when generating rules from P should follow what constraint, in order to guarantee that a close rule will be generated? It should be of the desired level of confidence. Slide No. 24.

Thank you. Questions?

Mining Quantitative Association Rules in Large Relational Databases

Mining Quantitative Association Rules in Large Relational Databases

Presentation Transcript

Sampling Large Databases for Association Rules

Mining Association Rules

Mining Association Rules in Large Databases

Mining Association Rules between Sets of Items in Large Databases

Mining Association Rules in Large Databases

Sampling Large Databases for Association Rules

Mining Quantitative Association Rules in Large Relational Tables

Mining Association Rules

Data Warehousing/Mining Comp 150 DW Chapter 6: Mining Association Rules in Large Databases

Mining Association Rules in Large Databases

Mining Association Rules in Large Databases

Chapter 5: Mining Association Rules in Large Databases

Scalable Mining For Classification Rules in Relational Databases

Mining Multiple-level Association Rules in Large Databases

Data Mining in Clinical Databases by using Association Rules

Sampling Large Databases for Association Rules

Mining Quantitative Association Rules in Large Relational Tables

Sampling Large Databases for Association Rules

Mining Association Rules in Large Databases

Mining Association Rules in Large Databases

Mining Association Rules