230 likes | 1.19k Views
13/Sep/2006. S.P.Vimal, CS IS Group, BITS-Pilani. 2. To discuss
E N D
1. Association Rule MiningMulti Level And Multi Dimensional Association Rule Mining
S.P.Vimal
Assistant Lecturer
CSIS/BITS-Pilani
vimalsp@bits-pilani.ac.in
2. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 2 To discuss… Multi Level Association Rules
Concepts
An Example
Mining
Uniform Support
Reduced Support
Redundant Rules
Mining Multi Dimensional Association Rules
Concepts
Mining using Static Discretization
Mining using Dynamic Discretization (ARCS)
Mining for Distance based Association rules
3. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 3 Multi Level Association Rules - Concepts
Rules Generated from mining data at different levels of abstraction
Essential to mine at different levels, in supporting business decision making
Massive amount of data highly sparse at the primitive level
Rules at high concept level adds to common sense
Rules at low concept level may not be interesting always
4. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 4 Multi Level Association Rules - An Example
5. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 5 Multi Level Association Rules - An Example
Items in task relevant data will be primitive
Primitive data items occurs least frequently
buys (hp-laptop computer) ?
buys (canon-inkjet printer)
Vs
buys (laptop computer) ? buys (inkjet printer)
Vs
buys (computer) ? buys (printer)
6. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 6 Multi Level Association Rules - Mining
Support- Confidence Framework
Top down Strategy, in accumulating counts
Algorithms – Apriori & it’s variations
Variations includes
Uniform support for all levels
Reduced Support at lower levels
7. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 7 Multi Level Association Rules - Mining (UNIFORM SUPPORT) Same support for all levels of abstraction
Subsets of ancestors not satisfying minimum support are not examined
Higher support threshold ? lose interesting associations at lower abstractions
Lower support threshold ? Many uninteresting associations at higher abstractions
8. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 8 Multi Level Association Rules - Mining (REDUCED SUPPORT) Lower levels of abstractions are set with lower support thresholds
9. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 9 Multi Level Association Rules - Mining (REDUCED SUPPORT) Alternate Search Strategies
Level by level independent
Full breadth search
No back Ground knowledge in pruning
Leads to examining lot of infrequent items
Level-cross filtering by single item
Examine nodes at level i, only if node at level i-1 is frequent
Misses frequent items at lower level abstractions (due to reduced support)
Level-cross filtering by k-itemset
Examine k-itemsets at level i, only if k-itemset at level i-1 is frequent
Misses frequent k-itemsets at lower level abstractions (due to reduced support)
10. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 10 Multi Level Association Rules - Mining (REDUCED SUPPORT) Controlled level-cross filtering by singe item
A modified level-cross filtering by singe item
Sets a level passage threshold for every levels
Allows the inspection of lower abstractions, even if its ancestor fails to satisfy min_sup threshold
Computer ? Printer
(At same Abstraction level)
Computer ? InkJet Printer (Cross level Association rules)
(At Different Abstraction level)
11. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 11 Multi Level Association Rules - Redundancy
Laptop computer ? InkJet Printer
(Support = 10 % , confidence = 70%)
Vs
HP Laptop Computer ? InkJet Printer
(Support = 5 % , confidence = 68%)
Second one is redundant due to the existing ancestor relationship
12. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 12 Multi Dimensional Association Rules - Concepts Rules involving more than one dimensions or predicates
buys (X, “IBM Laptop Computer”) ?
buys (X, “HP Inkjet Printer”)
(Single dimensional)
age (X, “20 ..25” ) and occupation (X, “student”) ?
buys (X, “HP Inkjet Printer”)
(Multi Dimensional- Inter dimension Association Rule)
age (X, “20 ..25” ) and buys (X, “IBM Laptop Computer”) ?
buys (X, “HP Inkjet Printer”)
(Multi Dimensional- Hybrid dimension Association Rule)
13. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 13 Multi Dimensional Association Rules - Concepts
Attributes can be categorical or quantitative
Quantitative attributes are numeric and incorporates hierarchy (age, income..)
Numeric attributes must be discretized
3 different approaches in mining multi dimensional association rules
Using static discretization of quantitative attributes
Using dynamic discretization of quantitative attributes
Using Distance based discretization with clustering
14. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 14 Multi Dimensional Association Rules -Mining using Static Discretization
Discretization is static and occurs prior to mining
Discretized attributes are treated as categorical
Use apriori algorithm to find all k-frequent predicate sets
Every subset of frequent predicate set must be frequent
If in a data cube the 3D cuboid (age, income, buys) is frequent implies (age, income), (age,buys), (income, buys)
15. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 15 Multi Dimensional Association Rules -Mining using Dynamic Discretization Known as Mining Quantitative Association Rules
Numeric attributes are dynamically discretized
Consider rules of type
Aquan1 ? Aquan2 ? Acat
(2D Quantitative Association Rules)
age(X,”20…25”) ? income(X,”30K…40K”) ? buys (X, ”Laptop Computer”)
ARCS (Association Rule Clustering System) An Approach for mining quantitative association rules
16. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 16 Multi Dimensional Association Rules -ARCS
Map pairs of quantitative attributes on a 2-D Grid (Use Equiwidth binning for discretization)
Search the Grid for cluster of points to generate association rules satisfying confidence & support
17. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 17 Multi Dimensional Association Rules -ARCS
Let the rules generated be
age(X,23)?income(X,”20..25”)?buys(X,”Laptop Computer” )
age(X,23)?income(X,”26..30”)?buys(X, ,”Laptop Computer” )
age(X,24)?income(X,”31..35”)?buys(X, ,”Laptop Computer” )
age(X,24)?income(X,”36..40”)?buys(X, ,”Laptop Computer” )
The 4 rules above can be generalized using clustering algorithm as
age(X,”23..24”)?income(X,”20..40”)?buys(X, ,”Laptop Computer” )
18. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 18 Multi Dimensional Association Rules -Distance-based Association Rule
Item_type(X, “electronic”) ?
Manufacturer(X, “foreign”)
? price(X,$250)
Binning methods such as equidepth, equiwidth do not capture the semantics of interval data
19. 13/Sep/2006 S.P.Vimal, CS IS Group, BITS-Pilani 19 Multi Dimensional Association Rules -Distance-based Association Rule
2 step mining process
Perform clustering to find the interval of attributes involved
Obtain association rules by searching for groups of clusters that occur together
The resultant rules must satisfy
Clusters in the rule antecedent are strongly associated with clusters of rules in the consequent
Clusters in the antecedent occur together
Clusters in the consequent occur together
==X==