1 / 17

3.4 improving the Efficiency of Apriori

3.4 improving the Efficiency of Apriori. A hash-based technique can be uesd to reduce the size of the candidate k-itermsets,Ck,for k>1.

yair
Download Presentation

3.4 improving the Efficiency of Apriori

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3.4 improving the Efficiency of Apriori A hash-based technique can be uesd to reduce the size of the candidate k-itermsets,Ck,for k>1. For example,when scanning each transaction in the database to generate the frequent 1-itermsets,L1,from the candidate 1-itermsets in C1,wecan generate all of the 2-itermsets for each transaction,hash them into the different

  2. buckets of a hash table structure,and increase the corresponding bucket counts. h(x,y)=((order of x)*10+(order of y)) mod 7

  3. 3.5 Mining multilevel association rules from transaction Example: all computer software printer desktop laptop educational financial color b/w

  4. 1.Using uniform minimum support for all levels(referred to as uniform support) The same minimum support threshold is used when mining at each level of abstraction. Level 1 Min_sup=5% Computer[support=10%] Level 2 Min_sup=5% Laptop computer[support=6% Desktop computer[support=4%

  5. 2. Using reduced minimum support at lower levels Level 1 Min_sup=5% Computer[support=10%] Level 2 Min_sup=3% Laptop computer[support=6% Desktop computer[support=4%

  6. 3.level-by-level independent:this is a full-breadth search,where no background knowledge of frequent itemsets is used for pruning,each node is examined. 4.level-cross filtering by single item:an item at the ith level is examined if and only if its parent node at the (i-1)th level is frequent. Computer[support=10% Level 1 Min_sup=12% Level 2 Min_sup=3% Laptop(not examined) Deaktop (not examined)

  7. 5.level-cross filtering by k-itemset:a k-iterm at the ith level is examined if and only if its corresponding parent k-itemset at the (i-1)th level is frequent. Level 1 Min_sup=5% Computer and printer[support=7%] Level 2 Min_sup=2% laptop computer And b/w printer [support=1%] Laptop computer And color printer [support=2%] Desktop computer And b/w printer [support=1%] Desktop computer And color printer [support=3%]

  8. 3.6 Mining Multidimensional association rules for data warehouses 1.Multidimensional association rules Example: age(X,”20..29”)^occupation(X,”student”) ==>buys(X,”laptop”) Rule contains three predicates age,occupation,buys=>data base attributes or warehouse dimension as predicates.rules with no repeated predicated are called interdimension rules.

  9. 2.Mining multidimensional association rules using static discretization of quantitative attributes Quantitative attributes are discretized prior to mining using predefined concept hierarchies,where numeric values are replaced by ranges. If the resulting task-relevant data are stored in a relational table,then the Apriori algorithm requires just a slight modification so as to find all frequent predicate sets rather than frequent itemsets.

  10. Else rules are calledhybrid-dimension rules. Age(X,”20..29”)^buys(X,”laptop”)=>buys(X,” Laptop”). Data attributes can be categorical or quantitative. Categorical attributes have a finite number of possible values,with no ordering,example:occupation,color,are also called nominal attributes.Quantitative attributes are numeric and have an implicit ordering among values,example:age,price.

  11. Example:

  12. Age(x,”31..40”)^income(x,“high”) buys(x,”yes”) A k-predicate set is a set containing k conjunctive predicates.for instance, the set of predicates {age,income,buys}

  13. 3.Mining quantitative association rules Quantitative association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy some mining criteria. we will focus specifically on how to mine rules having two quantitative attributes on the left_hand side of the rule,and one categorical attribute on the right_hand side of the rule for example: Aquan1 ^ Aquan2Acat

  14. Where Aquan1 and Aquan2 are tests on quantitative attribute ranges(where the ranges are dynamically determined),and Acat tests a categorical attribute from the task-relevant data.example: age(X,”30..39”) ^ income(X,”42K..48K) buys(X,”high resolution TV”) How can we find such rules? idea is: Maps paris of quantitative attributes onto a 2-D grid for tuples statisfying a given categorical attribute condition.

  15. The grid is then searched for clusters of points,from which the association rules are generated.example:(purchase high-resolution TVs) income age

  16. The four Xs correspond to the the rules: Age(x,34)^income(x,”31k..40k”) =>buys(x,”high resolution TV”) Age(x,35)^income(x,”31k..40k”) =>buys(x,”high resolution TV”) Age(x,34)^income(x,”41k..50k”) =>buys(x,”high resolution TV”) Age(x,35)^income(x,”31k..40k”) =>buys(x,”high resolution TV”) The four rules can be “clustered” together to form the following simpler rule: Age(x,34..35)^income(x,”31k..50k”) =>buys(x,”high resolution TV”)

More Related