240 likes | 372 Views
Association Rules. Hawaii International Conference on System Sciences (HICSS-40) January 2007 David L. Olson Yanhong Li. Fuzzy Association Rules. Association rules mining provides information to assess significant correlations in large databases IF X THEN Y Initial data mining analysis
E N D
Association Rules Hawaii International Conference on System Sciences (HICSS-40) January 2007 David L. Olson Yanhong Li
Fuzzy Association Rules • Association rules mining provides information to assess significant correlations in large databases • IF X THEN Y • Initial data mining analysis • Not predictive • SUPPORT: degree to which relationship appears in data • CONFIDENCE: probability that if X, then Y
Association Rule Algorithms • APriori • Agrawal et al., 1993; Agrawal & Srikant, 1994 • Find correlations among transactions, binary values • Weighted association rules • Cai et al., 1998; Lu et al. 2001 • Cardinal data • Srikant & Agrawal, 1996 • Partitions attribute domain, combines adjacent partitions until binary
Fuzzy Analysis Deal with vagueness & uncertainty • Fuzzy Set Theory • Zadeh [1965] • Probability Theory • Pearl [1988] • Rough Set Theory • Pawlak [1982] • Set Pair Theory • Zhao [2000]
Fuzzy Association Rules • Most based on APriori algorithm • Treat all attributes as uniform • Can increase number of rules by decreasing minimum support, decreasing minimum confidence • Generates many uninteresting rules • Software takes a lot longer
Gyenesei (2000) • Studied weighted quantitative association rules in fuzzy domain • With & without normalization • NONNORMALIZED • Used product operator to define combined weight and fuzzy value • If weight small, support level small, tends to have data overflow • NORMALIZED • Used geometric mean of item weights as combined weight • Support then very small
Algorithm • Get membership functions, minimum support, minimum confidence • Assign weight to each fuzzy membership for each attribute (categorical) • Calculate support for each fuzzy region • If support > minimum, OK • If confidence > minimum, OK • If both OK, generate rules
Membership value 1.2 1 0.8 0.6 0.4 0.2 0 Age 0 25 35 40 50 100 Young Middle Old Figure 2: The membership functions of attibute Age Fuzzified Age
Calculate Support for Each Pair of Fuzzy Categories • Membership value • Identify weights for each attribute • Identify highest fuzzy membership category for each case • Membership value = minimum weight associated with highest fuzzy membership category • Support • Average membership value for all cases
Support • If support for pair of categories is above minimum support, retain • Identifies all pairs of fuzzy categories with sufficiently strong relationship • For outcomes, R51(On Time) strong, R52(Default) not
Quartets • None qualify, so algorithm stops
Confidence • Identify direction • For those training set cases involving the pair of attributes, what proportion came out as predicted?
4 Rules • IF Income is Middle THEN Outcome is On-Time • R22→R51 support 0.490 confidence 0.916 • IF Credit is Good THEN Outcome is On-Time • R41→R51 support 0.576 confidence 0.972 • IF Income is Middle AND Credit is Good THEN Outcome is On-Time • R22R41→R51support 0.419 confidence 0.995 • IF Risk is High AND Credit is Good THEN Outcome is On-Time • R31R41→R51support 0.266 confidence 0.993
Higher order combinations • Try triplets • If ambitious, sets of 4, and beyond • Here, none • Problems: • Computational complexity explodes • Doesn’t guarantee total coverage • That also would explode complexity • Can control by lowering minsup, minconf
Simulation Testing • Selected 550 cases • Held out 100 • Randomly assigned weights to each fuzzy region of each attribute • minsup {0.35, 0.45, 0.55, 0.65} • minconf {0.7, 0.8, 0.9}