Fuzzy Versus Quantitative Association Rules: A Fair Data-Driven Comparison

Fuzzy Versus Quantitative Association Rules:A Fair Data-Driven Comparison Shih-Ming Bai and Shyi-Ming Chen Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, R.O.C.

Outline 1. Introduction 2. A New Method for Automatically Constructing Concept Maps Based on Fuzzy Rules 3. An Example 4. Conclusions

1. Introduction • The discovery of knowledge in databases, also called data mining, is a most promising and important research area. In data mining, association rules are often used to represent and identify dependencies between attributes in a database. • In most real-life applications, databases contain many other values besides 0 and 1. Very common, for instance, are quantitative attributes such as age or income.

2. Association Rule Mining Table I and Table II presents what could happen if we replace the quantitative attributes in a small database by either binary or fuzzy attributes.

3. Experimental ApproachA. Data Set: FAM95 • FAM95.DAT contains data for the 63,756 families that were interviewed in the March 1995 Current Population Survey (CPS).

B. Data-Driven Partition: Fuzzy c-means algorithm • Formula: • m = 1:

m = 2: • m = 3:

C. Comparing Association Rules • They compare the rankings obtained by the quantitative and the fuzzy algorithm using the Spearman rank correlation coefficient

D. Quantitative Versus Fuzzy Association Rules • Table III lists the 20 strongest rules obtained from the discrete (m = 1) and the fuzzy algorithm (m = 3) along with their confidence and support values.

4. Conclusion • The typical argumentation or motivation for involving fuzzy set theory in association rule mining is as follows: 1) that it allows for the rules to be formulated using vague linguistic expressions, hence easier to grasp by humans; 2) that it suppresses the unwanted effect that boundary cases might cause.

But quantitative association rule mining also gives (the same strong) rules formulated in the same way in natural • The sharp boundary problem is already inherently suppressed and can be further minimized by using sensible partitioning methods, as is already being done in quantitative association rule mining.

Hence, we may expect rules obtained using a data-driven approach to be significantly different from the rules obtained using an expert-driven approach. The comparison of fuzzy and quantitative association rules using an expert-driven approach (for large databases) is certainly an interesting topic for future research. • In this case, however, experts should also define the crisp intervals that correspond best to human intuition! The common practice of comparing data-driven crisp data mining with expert-driven fuzzy data mining does not provide convincing arguments for the introduction of fuzzy association rules.

Thank You!

Fuzzy Versus Quantitative Association Rules: A Fair Data-Driven Comparison