240 likes | 274 Views
Explore the principles and algorithms of fuzzy association rules mining with a focus on handling vagueness and uncertainty, including weighted rules and subsets. Delve into membership functions, support values, confidence levels, and practical applications. Discover insights and challenges through simulations and case studies.
E N D
Association Rules Hawaii International Conference on System Sciences (HICSS-40) January 2007 David L. Olson Yanhong Li
Fuzzy Association Rules • Association rules mining provides information to assess significant correlations in large databases • IF X THEN Y • Initial data mining analysis • Not predictive • SUPPORT: degree to which relationship appears in data • CONFIDENCE: probability that if X, then Y
Association Rule Algorithms • APriori • Agrawal et al., 1993; Agrawal & Srikant, 1994 • Find correlations among transactions, binary values • Weighted association rules • Cai et al., 1998; Lu et al. 2001 • Cardinal data • Srikant & Agrawal, 1996 • Partitions attribute domain, combines adjacent partitions until binary
Fuzzy Analysis Deal with vagueness & uncertainty • Fuzzy Set Theory • Zadeh [1965] • Probability Theory • Pearl [1988] • Rough Set Theory • Pawlak [1982] • Set Pair Theory • Zhao [2000]
Fuzzy Association Rules • Most based on APriori algorithm • Treat all attributes as uniform • Can increase number of rules by decreasing minimum support, decreasing minimum confidence • Generates many uninteresting rules • Software takes a lot longer
Gyenesei (2000) • Studied weighted quantitative association rules in fuzzy domain • With & without normalization • NONNORMALIZED • Used product operator to define combined weight and fuzzy value • If weight small, support level small, tends to have data overflow • NORMALIZED • Used geometric mean of item weights as combined weight • Support then very small
Algorithm • Get membership functions, minimum support, minimum confidence • Assign weight to each fuzzy membership for each attribute (categorical) • Calculate support for each fuzzy region • If support > minimum, OK • If confidence > minimum, OK • If both OK, generate rules
Membership value 1.2 1 0.8 0.6 0.4 0.2 0 Age 0 25 35 40 50 100 Young Middle Old Figure 2: The membership functions of attibute Age Fuzzified Age
Calculate Support for Each Pair of Fuzzy Categories • Membership value • Identify weights for each attribute • Identify highest fuzzy membership category for each case • Membership value = minimum weight associated with highest fuzzy membership category • Support • Average membership value for all cases
Support • If support for pair of categories is above minimum support, retain • Identifies all pairs of fuzzy categories with sufficiently strong relationship • For outcomes, R51(On Time) strong, R52(Default) not
Quartets • None qualify, so algorithm stops
Confidence • Identify direction • For those training set cases involving the pair of attributes, what proportion came out as predicted?
4 Rules • IF Income is Middle THEN Outcome is On-Time • R22→R51 support 0.490 confidence 0.916 • IF Credit is Good THEN Outcome is On-Time • R41→R51 support 0.576 confidence 0.972 • IF Income is Middle AND Credit is Good THEN Outcome is On-Time • R22R41→R51support 0.419 confidence 0.995 • IF Risk is High AND Credit is Good THEN Outcome is On-Time • R31R41→R51support 0.266 confidence 0.993
Higher order combinations • Try triplets • If ambitious, sets of 4, and beyond • Here, none • Problems: • Computational complexity explodes • Doesn’t guarantee total coverage • That also would explode complexity • Can control by lowering minsup, minconf
Simulation Testing • Selected 550 cases • Held out 100 • Randomly assigned weights to each fuzzy region of each attribute • minsup {0.35, 0.45, 0.55, 0.65} • minconf {0.7, 0.8, 0.9}