200 likes | 216 Views
Identifying Interesting Association Rules with Genetic Algorithms. Elnaz Delpisheh York University Department of Computer Science and Engineering January 4, 2020. Data mining. Too much data. Data. Data Mining. I = {i 1 ,i 2 ,...,i n } is a set of items .
E N D
Identifying Interesting Association Rules with Genetic Algorithms Elnaz Delpisheh York University Department of Computer Science and Engineering January 4, 2020
Data mining Too much data Data Data Mining • I = {i1,i2,...,in} is a set of items. • D = {t1,t2,...,tn} is a transactional database. • ti is a nonempty subset of I. • An association rule is of the form AB, where A and B are the itemsets, A⊂ I, B⊂ I, and A∩B=∅ . • Apriorialgorithm is mostly used for association rule mining. • {milk, eggs}{bread}. Association rules
Association rule mining Too much data Data Data Mining Too many association rules Association rules
Interestingness criteria • Comprehensibility. • Conciseness. • Diversity. • Generality. • Novelty. • Utility. • ...
Interestingness measures • Subjective measures • Data and the user’s prior knowledge are considered. • Comprehensibility, novelty, surprisingness, utility. • Objective measures • The structure of an association rule is considered. • Conciseness, diversity, generality, peculiarity. • Example: Support • It represents the generality of a rule. • It counts the number of transactions containing both A and B.
Drawbacks of objective measures • Detabase-dependence • Lack of knowledge about the database • Threshold dependence • Solution • Multiple database reanalysis • Problem • Large number of disk I/O Detabase-independence
Genetic algorithm-based learning (ARMGA ) • Initialize population • Evaluate individuals in population • Repeat until a stopping criteria is met • Select individuals from the current population • Recombine them to obtain more individuals • Evaluate new individuals • Replace some or all the individuals of the current population by off-springs • Return the best individual seen so far
ARMGA Modeling • Given an association rule XY • Requirement • Conf(XY) > Supp(Y) • Aim is to maximise
ARMGA Encoding • Michigan Strategy • Given an association k-rule XY, where X,Y⊂I, I is a set of items I=i1,i2,..., in, and X∩Y=∅. • For example • {A1,...,Aj}{Aj+1,...,Ak}
ARMGA Encoding (Cont.) • The aforementioned encoding highly depends on the length of the chromosome. • We use another type of encoding: • Given a set of items {A,B,C,D,E,F} • Association rule ACFB is encoded as follows • 00A11B00C01D11E00F • 00: Item is antecedent • 11: Item is consequence • 01/10: Item is absent
ARMGA Operators • Select • Crossover • Mutation
ARMGA Operators-Select • Select(c,ps): Acts as a filter of the chromosome • C: Chromosome • Ps: pre-specified probability
ARMGA Operators-Crossover • This operation uses a two-point strategy
Empirical studies and Evaluation • Implement the entire procedure using Visual C++ • Use WEKA to produce interesting association rules • Compare the results