200 likes | 211 Views
Explore efficient methods for discovering association rules with numeric data. Study the OPUS search algorithm for impact rule detection. Evaluate and conclude findings. Personal insights and opinions shared.
E N D
Discovering associations with numeric variables Advisor:Dr. Hsu Graduate: Ching-Lung Chen IDSL,Intelligent Database System Lab
Outline • Motivation • Objective • Introduction • Alternative numeric data mining techniques • The OPUS search algorithm • Search for impact rules • Evaluation • Conclusion • Personal Opinion IDSL,Intelligent Database System Lab
Motivation • To further develops Aumann and Lindell’s proposal for a variant of association rule for which the consequent is a numeric variable. • Traditional association rules cannot discovered directly with numeric data. IDSL,Intelligent Database System Lab
Objective • To improve the efficiency of algorithm that enable these rules to be discovered for dense data set. IDSL,Intelligent Database System Lab
Introduction • prev-response>0.2 & region=E & socio-class=F => antecdent • profitability: => target count=l05211, mean=12.51,sum=1316190.61, max=205.31, min=-l.05 =>consequent • Association rules have demonstrated the ability to detect interesting associations between fields in a database. • This paper renamed impact rules and characterize it as follows: training set: a finite set of records. conditions:each record is an element. target: which is associated with a numeric value . antecedent: an impact rule consists of a conjunction of conditions. consequent: describing the impact on the target of selecting the training set records that satisfy the antecedent. IDSL,Intelligent Database System Lab
Alternative numeric data mining techniques • Regression techniques • can predict a numeric value for a given set of input parameters. • cannot provide an explicit segmentation of the data. • Association rules • can provision a statistics relating to the frequency of one sub-range • cannot directly address the issue of the impact IDSL,Intelligent Database System Lab
for m elements has search space m 2 The OPUS search algorithm • OPUS provides a efficient search for impact rules ,present by G. I. Webb. Journal of Artificial Intelligence research • It assigning an arbitrary order to the elements, so that each combination of elements was considered just once. Fig. 1.A fixed-structure search space IDSL,Intelligent Database System Lab
If C can be a solution then pruning the nodes below {b} The OPUS search algorithm • Under fix structure search, algorithm typically seek to identify branches that cannot contain a solution and prune those branches. Fig. 2. Pruning a branch from a fixed-structure search space IDSL,Intelligent Database System Lab
removal all nodes containing b. Fig. 3. Pruning all nodes containing a single operator from a fixed-structure search space The OPUS search algorithm • If C can be a solution then a pruning rule identifying that no node containing bmay contain a solution would enable removal all nodes containing b. IDSL,Intelligent Database System Lab
Search for impact rules The OPUS_IR algorithm Algorithm: OPUS_IR(Current, Available) 1. SoFar : = {} 2. FOR EACH P in Available (a) New := Current (b) If pruning rules cannot determine that Available: solution(x New) Then record New relevant stats OPUS_IR(New, SoFar) SoFar := SoFar {p} End if End FOR IDSL,Intelligent Database System Lab
t arg IRule n targ Search for impact rules • is the mean of the target for all records. • value(x) the value of the target for a record x • cover(X)is the set of record for which the condition X is true • ante(ir)is the antecedent of impact rule ir • mean(ir) is the mean value of the target for records covered by ante(ir) is the impact rule with impact(ir) = (mean(ir) - ) * |cover(ante(ir))| IDSL,Intelligent Database System Lab
Search for impact rules • If the follow condition happened then no superset of New may be a solution: • If • If • For all searches, if New covers no records • If |cover(New)| < minsup IDSL,Intelligent Database System Lab
= = = = = n 4 , x d , x c , x b , x a - - 1 2 1 n n n activation of OPUS_IR with Current ={ } x ,..., x - 1 n 2 Relative complexity • If all itemsets is , both and are frequent itemsets. In this case: If for the call to OPUS_IR with Current = Both generate by the activation IDSL,Intelligent Database System Lab
will not be added to SoFar x n { x ,..., x , x } - - 1 n 2 n 1 will not be generated { x ,..., x , x , x } - - 1 n 2 n 1 n no call to OPUS_IR with Current = { x ,..., x , x } - - 1 n 2 n 1 Relative complexity • If is not a frequent itemset then • If is not a frequent itemset then not be passed as a member of Available to the activation with IDSL,Intelligent Database System Lab
will be generated { x ,..., x , x , x } - - 1 n 2 n 1 n will be passed as a value in Available x - n 1 Relative complexity • If both are frequent itemsets then IDSL,Intelligent Database System Lab
Evaluation • OPUS_IR was implemented as a DOS program and run on a 800MHz PIII Windows PC with 256Mb RAM. • This implementation was applied to each data set to find 1000 impact rule with the highest impact on the target. Data Sets Results IDSL,Intelligent Database System Lab
Example rules • Each rule is preceded by the summary population statistics for the data set. The results for the data set from the KDD Repository. IDSL,Intelligent Database System Lab
The frequent itemset approach Frequent Itemset Generation IDSL,Intelligent Database System Lab
Conclusions • Impact rule provide association rule like knowledge discovery for numeric data. • This paper presents efficient techniques for impact rule discovery. • These techniques avoid the need to set arbitrary constraints on the cover of the antecedent of an impact rule. IDSL,Intelligent Database System Lab
Personal Opinion • This algorithm even though avoid the need to set arbitrary constraints but still need to assign the recursive calls depth, maybe we can develop a algorithm for automatic search without limited. IDSL,Intelligent Database System Lab