120 likes | 277 Views
MIS 451 Building Business Intelligence Systems. Introduction to Data Mining. Why data mining?. OLAP can only provide shallow data analysis -- what Ex: sales distribution by product. Why data mining?. Shallow data analysis is not sufficient to support business decisions -- how
E N D
MIS 451Building Business Intelligence Systems Introduction to Data Mining
Why data mining? • OLAP can only provide shallow data analysis -- what • Ex: sales distribution by product
Why data mining? • Shallow data analysis is not sufficient to support business decisions -- how • Ex: how to boost sales of other products • Ex: when people buy product 6 what other products do they are likely to buy? – cross selling
Why data mining? • OLAP can only do shallow data analysis • OLAP is based on SQL SELECT PRODUCTS.PNAME, SUM(SALESFACTS.SALES_AMT) FROM DBSR.PRODUCTS PRODUCTS, DBSR.SALESFACTS SALESFACTS WHERE ( ( PRODUCTS.PRODUCT_KEY = SALESFACTS.PRODUCT_KEY ) ) GROUP BY PRODUCTS.PNAME; • The nature of SQL decides that complicated algorithm cannot be implemented with SQL. • Complicated algorithms need to be developed to support deep data analysis – data mining
Why data mining? • OLAP results generated from data sets with large number of attributes are difficult to be interpreted • Ex: cluster customers of my company --- target marketing • Pick two attributes related to a customer: income level and sales amount
Why data mining? • Ex: cluster customers of my company --- target marketing • Pick three attributes related to a customer: income level, education level and sales amount
What is data mining? • Data mining is a process to extract hidden and interesting patterns from data. • Data mining is a step in the process of Knowledge Discovery in Database (KDD).
Step 5: Interpretation & Evaluation Step 4: Data Mining Knowledge Step 3: Transformation Step 2: Cleaning Patterns Step 1: Selection Transformed Data Preprocessed Data Target Data Steps of the KDD Process Data
Steps of the KDD Process • Step 1: select interested columns (attributes) and rows (records) to be mined. • Step 2: clean errors from selected data • Step 3: data are transformed to be suitable for high performance data mining • Step 4: data mining • Step 5: filter out non-interesting patterns from data mining results
Data mining – on what kind of data • Transactional Database • Data warehouse • Flat file • Web data • Web content • Web structure • Web log
Major data mining tasks • Association rule mining – cross selling • Clustering – target marketing • Classification – potential customer identification, fraud detection