460 likes | 669 Views
Mining Quantitative Association Rules in Large Relational Databases. Ramakrishnan Srikant Rakesh Agrawal ACM SIGMOD Conference on Management of Data, 1996 March 21, 2013 (Slides modified from Sasi Sekhar Kunta’s version.) . Presented by: Sepehr Amir- Mohammadian. Outline.
E N D
Mining Quantitative Association Rules in Large Relational Databases RamakrishnanSrikant RakeshAgrawal ACM SIGMOD Conference on Management of Data, 1996 March 21, 2013 (Slides modified from SasiSekharKunta’s version.) Presented by: Sepehr Amir-Mohammadian
Outline Association Rules and Quantitative Association Rules Formal Study of Quantitative Association Analysis Partitioning Quantitative Attributes Identifying the Interesting Rules Candidate Generation Concluding Remarks Q&A
Outline Association Rules and Quantitative Association Rules Formal Study of Quantitative Association Analysis Partitioning Quantitative Attributes Identifying the Interesting Rules Candidate Generation Concluding Remarks Q&A
Association Rules Itemsets and , Rule Support: Confidence: Find rules that have MinSup and MinConf
Mapping to Boolean Association Rules Use as new attribute instead of a categorical attribute Use as new attribute instead of a quantitative attribute with a small domain Use as new attribute instead of a quantitative attribute with a large domain
Problems “MinSup”: If number of partitions is large, the support of a single partition can be lower “MinConf”: Information lost during partition values into intervals. Confidence can be lower as number of intervals is smaller
Solution • Consider all combinations of adjacent values/intervals in quantitative attributes Solves “MinSup” problem • Increase the number of values/intervals, without encountering “MinSup” problem Reduces information loss • New Problems: • Execution time: Maximum support threshold, MaxSup • Many rules: Interestingness of rules
Steps of Proposed Approach Determine the number of partitions for each quantitative attribute Map values/ranges to consecutive integer values such that the order is preserved Find the support of each value of the attributes, and combine when support is less than MaxSup. Find frequent itemsets, whose support is larger than MinSup Use frequent itemsets to generate association rules Pruning out uninteresting rules
Example Step 0: Initial set of records
Example – Cont. Step 1: Determine the partitions for each quantitative attributes
Example – Cont. Step 2: Mapping intervals/values to consecutive intergers
Example – Cont. Step 2: Mapping intervals/values to consecutive integers
Example – Cont. • Step 3: Extracting large itemsets • Some of these itemsets are represented with MinSup = 0.4
Example – Cont. • Step 4: Rule generation • Some of these rules are represented with MinConf = 0.5
Outline Association Rules and Quantitative Association Rules Formal Study of Quantitative Association Analysis Partitioning Quantitative Attributes Identifying the Interesting Rules Candidate Generation Concluding Remarks Q&A
Formal Study of Quantitative A. A. • set of attributes • set of positive integers • , denotes that attribute has value • set of items • For any , • , set of records • , a record such that attributes are distinct • A record supports itemset if • , a quantitative association rule, where • ,
Formal Definition of Quantitative A. A. – Cont. • holds in with support , if of the records in support . • holds in with confidence , if of the records in that support , also support . • , probability that all items in are supported by a given record • is a generalization of , denoted by if
Outline Association Rules and Quantitative Association Rules Formal Study of Quantitative Association Analysis Partitioning Quantitative Attributes Identifying the Interesting Rules Candidate Generation Concluding Remarks Q&A
Partitioning Quantitative Attributes • A measure of partial completeness: Information lost in partitioning • : set of rules obtained before partitioning • : set of rules obtained after partitioning • Partial completeness measures the distance between a rule in and its closest generalization in • The distance is defined by the ratio of support • Give the best approach to have minimal number of partitions
Partial Completeness • : the set of frequent itemsets • For any , is -complete w.r.t if • The smaller is, the less the information lost
Example – K-Completeness Consider the following set of frequent itemsets: Then, items 2, 3, 5, 7 form a 1.5-complete set. But, items 3,5,7 do not form a 1.5-complete set.
Confidence of Rules Generated from K-Complete Set If is -complete set w.r.t , then any rule obtained from has a generalizationfrom , such that is bounded by In the previous example:
K-Completeness for a Single Attribute Consider as a quantitative attribute, partitioned into base intervals. Suppose than the support for each base interval is less than Let be the set of all combinations of base intervals that have . Then, is -complete w.r.t. the set of all ranges over .
K-Completeness for a Group of Attributes Consider a set 0f quantitative attributes, partitioned into base intervals. Suppose that the support for each base interval is less than Let be the set of all frequent itemsets over the partitioned attributes. Then, is -complete w.r.t. the set of all frequent itemsets without partitioning.
Equi-Depth Partitioning Equi-depth partitioning: Splitting the support identically Suppose that the number of intervals are given. Then, equi-depth partitioning minimizes max support for a base interval , and so minimizes . Suppose that is given and . Then, equi-depth partitioning with support in each base interval results in the minimum number of intervals:
Outline Association Rules and Quantitative Association Rules Formal Study of Quantitative Association Analysis Partitioning Quantitative Attributes Identifying the Interesting Rules Candidate Generation Concluding Remarks Q&A
Identify Interesting Rules • Combining intervals results in many rules • For example, suppose a quarter of people in age group 20..30 are in the age group 20..25 • with 8% sup, 70% conf • , with 2% sup, 70% conf • The second rule doesn’t give any additional information, and is less general than the first rule
Expected Value of Support and Confidence Interest: Rules with support and confidence according to some expectations Let Let , The expected value of based on , would be ) Similarly, the expected value of the confidence for the rule according to its generalization would be ) where , .
Interest Measure • Itemset is -interesting w.r.t its generalization , if • , and • For any specialization with , is -interesting w.r.t • Rule is -interesting w.r.t its generalization if • , or • Moreover, the itemset is -interesting w.r.t .
Outline Association Rules and Quantitative Association Rules Formal Study of Quantitative Association Analysis Partitioning Quantitative Attributes Identifying the Interesting Rules Candidate Generation Concluding Remarks Q&A
Candidate Generation • Given the set of all frequent -itemsets, generate the set of • The process has three parts: • Join Phase • Subset Prune Phase • Interest Prune Phase
Join Phase • joined with itself • Example, : • Result of self-join, :
Subset Prune Phase • Make sure any -subset is in . • Example, : • Result of self-join, : • Delete the first itemset in since is not in .
Interest Prune Phase Given user-specified interest level Delete any itemset that contains an item with support greater than It is guaranteed that such itemsets cannot be -interesting w.r.t their generalizations
Outline Association Rules and Quantitative Association Rules Formal Study of Quantitative Association Analysis Partitioning Quantitative Attributes Identifying the Interesting Rules Candidate Generation Concluding Remarks Q&A
Concluding Remarks Introduced the problem of mining quantitative association rules Dealt with quantitative attributes by fine-partitioning the values and combining adjacent partitions as necessary Introduced partial completeness to quantify the information lost, and help decide the partitions Gave interest measure to identify interesting rules Candidate Generation
Outline Association Rules and Quantitative Association Rules Formal Study of Quantitative Association Analysis Partitioning Quantitative Attributes Identifying the Interesting Rules Extending the Apriori Algorithm Concluding Remarks Q&A
Exam Questions 1. What are the two problems with mapping quantitative associations to boolean associations? Slide No. 8 2. Give the general steps to be followed in order to mine quantitative association rules. Slide No. 10 3. If P is a K-Complete set w.r.t. the set of all frequent itemsets, the minimum confidence when generating rules from P should follow what constraint, in order to guarantee that a close rule will be generated? It should be of the desired level of confidence. Slide No. 24.
Thank you. Questions?