210 likes | 232 Views
Decision Support and Business Intelligence Systems (9 th Ed., Prentice Hall). Chapter 5: Data Mining for Business Intelligence. Learning Objectives. Define data mining as an enabling technology for business intelligence
E N D
Decision Support and Business Intelligence Systems(9th Ed., Prentice Hall) Chapter 5: Data Mining for Business Intelligence
Learning Objectives • Define data mining as an enabling technology for business intelligence • Understand the objectives and benefits of business analytics and data mining • Recognize the wide range of applications of data mining • Understand the steps involved in data preprocessing for data mining
Introduction • Data is produced at a phenomenal rate • Our ability to store has grown • Users expect more sophisticated information • How? UNCOVER HIDDEN INFORMATION DATA MINING
Examples: What is (not) Data Mining? • What is not Data Mining? • Look up phone number in phone directory • Query a Web search engine for information about “Amazon” • What is Data Mining? • Certain names are more prevalent in certain US locations (e.g. in Boston area,…) • Group together similar documents returned by search engine according to their context (e.g. Amazon.com, …) • A customer with income between 10,000 and 20,000 and age between 20 and 25 who purchased milk and bread is likely to purchase diapers within 5 years. • The amount of fish sold to people living in a certain area and have income between 20,000 and 35,000 is increasing.
Data Mining Data Mining: the process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions. • Involves analysis of data and use of software techniques for finding hidden and unexpected patterns and relationships in sets of data. • Potential Result: Higher-level meta information that may not be obvious when looking at raw data • Similar terms • Exploratory data analysis • Data driven discovery • Deductive learning
Decisions in Data Mining • Databases to be mined • Relational, transactional, object-oriented, object-relational, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW, etc. • Knowledge to be mined • Association, classification, clustering, etc. • Techniques utilized • Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, neural network, etc. • Applications adapted • Retail, telecommunication, banking, fraud analysis, DNA mining, stock market analysis, Web mining, Weblog analysis, etc.
Data Mining Tasks • Prediction Tasks • Use some variables to predict unknown or future values of other variables • Description Tasks • Find human-interpretable patterns that describe the data. Common data mining tasks • Classification [Predictive] • Find all credit applicants who are poor credit risks. (classification) • Clustering [Descriptive] • Identify customers with similar buying habits.(Clustering) • Association Rule Discovery [Descriptive] • Find all items which are frequently purchased with milk • Sequential Pattern Discovery [Descriptive]
Classification: Definition • Given a collection of records (training set ) • Each record contains a set of attributes, one of the attributes is the class. • Find a model for class attribute as a function of the values of other attributes. • Goal: previously unseen records should be assigned a class as accurately as possible. • A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
Test Set Model Classification Example Learn Classifier Training Set
Classification: Application Example • Direct Marketing • Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new cell-phone product. • Approach: • Use the data for a similar product introduced before. • We know which customers decided to buy and which decided otherwise. This {buy, don’t buy} decision forms the class attribute. • Collect various demographic, lifestyle, and company-interaction related information about all such customers. • Type of business, where they stay, how much they earn, etc. • Use this information as input attributes to learn a classifier model.
Clustering Definition • Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that • Data points in one cluster are more similar to one another. • Data points in separate clusters are less similar to one another.
Clustering: Application Example • Market Segmentation: • Goal: subdivide a market into distinct subsets of customers where any subset may conceivably be selected as a market target to be reached with a distinct marketing mix. • Approach: • Collect different attributes of customers based on their geographical and lifestyle related information. • Find clusters of similar customers.
Association Rule :Application Example • Supermarket shelf management. • Goal: To identify items that are bought together by sufficiently many customers. • Approach: Process the point-of-sale data collected with barcode scanners to find dependencies among items. • A classic rule -- • If a customer buys diaper and milk, then he is very likely to buy beer:
Examples of Data Mining applications: • Retail / Marketing • Identifying buying patterns of customers. • Predicting response to mailing campaigns. • Banking • Detecting patterns of CC fraud • Identifying loyal customers. • Medicine • Identifying successful medical therapies. • Banking and Other Financial • Automate the loan application process • Detecting fraudulent transactions • Maximize customer value (cross-, up-selling)
Examples of Data Mining applications: • Customer Relationship Management • Maximize return on marketing campaigns • Improve customer retention • Maximize customer value (cross-, up-selling) • Identify and treat most valued customers • Manufacturing and Maintenance • Predict/prevent machinery failures • Identify anomalies in production systems to optimize the use manufacturing capacity • Discover novel patterns to improve product quality
Data Mining Applications • Brokerage and Securities Trading • Predict changes on certain bond prices • Forecast the direction of stock fluctuations
Data Mining Software • Commercial • SPSS - PASW (formerly Clementine) • SAS - Enterprise Miner • IBM - Intelligent Miner • StatSoft – Statistical Data Miner • … many more • Free and/or Open Source • Weka • RapidMiner… Source: KDNuggets.com, May 2009