170 likes | 325 Views
SDS-Rules and Association Rules. March 17, 2004 Nicosia, Cyprus Tomáš Karban 1 Jan Rauch 2 Milan Šimůnek 2 1 Charles University, Prague Dept. of Software Engineering 2 University of Economics, Prague Dept. of Information and Knowledge Engineering
E N D
SDS-Rules andAssociation Rules March 17, 2004 Nicosia, Cyprus Tomáš Karban 1 Jan Rauch 2 Milan Šimůnek 2 1 Charles University, Prague Dept. of Software Engineering 2 University of Economics, PragueDept. of Information and Knowledge Engineering ACM Symposium on Applied Computing SAC 2004
Agenda • Introduction to association rules • Motivation of SDS-rules • SDS-rules in details • SDS quantifiers • Disjoint sets • Implementation technique • Application on medical data • Conclusion SDS-Rules and Association Rules
Association Rules (1) • Express relation between premise (antecedent) and consequence (succedent) • and are Boolean attributes derived as conjunctions from columns of studied data table (rows = objects) • stands for quantifier – truth condition of association rule based on contingency table of and • Example:account(low) & salary(low) 90% loan_quality(bad) SDS-Rules and Association Rules
Association Rules (2) • Contingency table • Founded implication • Various quantifiers available:implications, double implications, equivalence, statistical hypotheses tests, above/outside average relations, etc. SDS-Rules and Association Rules
Motivation of SDS-rules • Describe interesting relations between couplesof disjoint sets (usually catch their difference) • Use similar way, same methods • Example: • get couples of sets that differ significantly in selected property • get all properties that differ on fixed pair of sets • combination of both... • Motivation comes directly from demands of STULONG project (atherosclerosis risk factors) SDS-Rules and Association Rules
SDS-Rules (1) • SDS-rules can be understood as an extensionto association rules • SDS-rules have the form (,,) • , define two disjoint sets A and B • defines some property • symbol stands for SDS-quantifier, which defines relation of two sets in the property SDS-Rules and Association Rules
first set second set outside both sets SDS-Rules (2) • Table of frequencies is extended to six-fold(called “SDS-table”) SDS-Rules and Association Rules
Asymmetric Multiplicative Difference Quantifier • the first set contains at least k-times more percentof objects with the property than the second set • both sets have size bigger than Base SDS-Rules and Association Rules
Symmetric Additive Difference Quantifier • the percentage of the objects with the property differs between the first and the second set at least by p • both sets have size bigger than Base SDS-Rules and Association Rules
Disjoint sets • Empty intersection of sets can be arranged syntactically by forcing common attribute to and • Coefficients (i.e. values of the attribute) of common attribute are disjoint sets are disjoint • Example:account(low) & salary(mid) salary(low) & sex(male) SDS-Rules and Association Rules
Implementation Technique • Data representation – bit strings for every valueof every attribute being used • Bit string length = number of objects in data table • Value “1” in the position i of bit string s(x) =object i has value x for the attribute s • Fast operations on bit strings – AND, OR, NOT • Building bit strings for the first set, the second setand for studied property • Calculation of SDS-table – counting of “1” in bit strings • Truth value of SDS-rule – expression on frequenciesfrom SDS-table • Memory conservative SDS-Rules and Association Rules
Application on Medical Data • STULONG project (“longitudinal study”) • studied prevalence of risk factors of atherosclerosis • 1400 middle-aged men • detailed entry examination, 20 years of checkups • Among many other analytical questions: • Are there strong relations concerning entry examination and the cause of death? • Are there differences in entry examination between men of the risk group, who came down with observed cardiovascular disease (during control examinations) and those who stayed healthy? SDS-Rules and Association Rules
Results (1) • If we compare the group of patients, who are divorced, have reached apprentice school education and have other responsibility in their jobs, • with the second group of patients, who are already pensioners, • there is a 53.8% difference in the presence of other cause of death. SDS-Rules and Association Rules
Results (1) SDS-Rules and Association Rules
Results (2) • If we compare the group of patients, who came down with some cardiovascular disease during the control checks, • with those, who stayed healthy, • we see that in the second group there were 3.97% more patients working in a managerial position. SDS-Rules and Association Rules
Results (2) SDS-Rules and Association Rules
Conclusion • A new method of describing potentially interesting patterns by SDS-rules was described • Method was inspired by and applied on medical data,other application domains can surely benefit as well • Method is computationally effective • Drawback – results are usually large and SDS-rules produced are similar in certain domains (“nuggets”) • additional software tool for “online result browsing” • Development of statistical SDS-quantifiers is in progress SDS-Rules and Association Rules