170 likes | 417 Views
Introduction of Data Mining and Association Rules. cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia. What is data mining?. The automated extraction of hidden predictive information from database Allows users to analyze large databases to solve business decision problems.
E N D
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student:Dongyi Jia
What is data mining? • The automated extraction of hidden predictive information from database • Allows users to analyze large databases to solve business decision problems. • An extension of statistics, with a few artificial intelligence and machine learning twists thrown in. • Attempts to discover rules and patterns from data.
Data Mining - On What Kind of Data • In principle, data mining should be applicable to any kind of information repositiory: ● relational databases ● data warehouses ● transactional and advanced databases ● flat files ● World Wide Web
Data Mining Functionalities-What kinds of Patterns Can be Mined? • Association Analysis • Classification and Prediction • Cluster Analysis • Evolution Analysis
Applications of data mining • Require some sort of Prediction: for example: when a person applies for a credit card, the credit-card company wants to predict if the person is a good credit risk. • Looks for Associations: for example: if a customer buys a book, an on-line bookstore may suggest other associated books.
Associations Rule Discovery • Task: Discovering association rules among items in a transaction database. • How are association rules mined from large database? 1. Find all frequent itemset: each of these itemsets will occur at least as frequent as pre-determined minimum support count. 2. Generate strong association rules from the frequent itemsets: these rules must satisfy minimum support and minimum confidence.
Association Rules (cont.) • Retail shops are often interested in associations between items that people buy. • Someone who buys bread is quite likely also to buy milk. association rule: bread => milk • A person who brought the book Database System Concepts is quite likely also to buy the book Operating System Concepts. association rule: DSC => OSC
Association Rules (cont.) • Two numbers: • Support:is a measure of what fraction of the population satisfies both the antecedent and the consequent of the true. • Confidence:is a measure of how often the consequent is true when the antecedent is true.
Association Rules (cont.) • Let I = {i1, i2, …im} be a total set of items D is a set of transactions d is one transaction consists of a set of items d I • Association rule: • X Y where X I ,Y I and X Y = • support = (#of transactions contain X Y ) /D • confidence = (#of transactions contain X Y ) / #of transactions contain X
example • Example of transaction data: • CD player, music’s CD, music’s book • CD player, music’s CD • music’s CD, music’s book • CD player • I = {CD player, music’s CD, music’s book} • D = 4 • #of transactions contain both CD player, music’s CD =2 • #of transactions contain CD player =3 • CD player music’s CD (sup=2/4 , conf =2/3 )
Association Rules (cont.) • Rule support and confidence reflect the usefulness and certainty of discovered rules. • A support of 50% for association rule means that 50% of all the transactions under analysis that CD’s player and music CD are purchased together. • A confidence of 67% means that 67% of the customers who purchased a CD’s player also bought music CD.
Strong Association Rule • User sets support and confidence thresholds. • Rules above support threshold have LARGE support. • Rules above confidence threshold have HIGHconfidence. • Rules satisfying both are said to be STRONG.
References • Professor Lee’s lectures • http://www.cs.sjsu.edu/~lee/cs157b/cs157b.html • Rui Zhao, SJSU http://www.cs.sjsu.edu/~lee/cs157b/cs157b.html • Jiawei Han, Micheline Kamber Data Mining Concepts and Techniques Morgan Kaufmann Publishers