110 likes | 220 Views
NCSA/IBSE Knowledge Discovery In Databases September 22, 1999 Peter Mulhall Raul Zaritsky Michael Welge. Database analysis, decision support, and automation Survey Results Analysis Inappropriate Practices Fraud Detection Manufacturing Process Analysis Risk Analysis and Management
E N D
NCSA/IBSE Knowledge Discovery In Databases September 22, 1999 Peter Mulhall Raul Zaritsky Michael Welge
Database analysis, decision support, and automation Survey Results Analysis Inappropriate Practices Fraud Detection Manufacturing Process Analysis Risk Analysis and Management Market and Sales Analysis Scientific Data Analysis Text Document Analysis Solutions to all human problems Why Data Mining? -- Potential Applications
Database Systems, Data Warehouses, and OLAP Machine Learning Statistics Mathematical Programming Visualization High Performance Computing Data Mining: Confluence of Multiple Disciplines
Relational Databases Data Warehouses Transactional Databases Advanced Database Systems Object-Relational Spatial Temporal Text Heterogeneous, Legacy, and Distributed WWW Data Mining: On What Kind of Data?
Why Do We Need Data Mining? • Leverage organization’s data assets • Only a small portion (typically - 5%-10%) of the collected data is ever analyzed • Data that may never be analyzed continues to be collected, at a great expense, out of fear that something which may prove important in the future is missed • Growth rates of data precludes traditional “manual intensive” approach
Why Do We Need Data Mining? • As databases grow, the ability to support the decision support process using traditional query languages become infeasible • Many queries of interest are difficult to state in a query language (Query formulation problem) • “find all cases of fraud” • “find all individuals likely to buy a FORD Expedition” • “find all documents that are similar to this customers problem”
Data Mining - A Working Definition • Data Mining is a “decision support” process in which we search for patterns of information in data. • Data Mining is a process of discovering advantageous patterns in data. A pattern is a conservative statement about a probability distribution.
Fiscal Examples: Increasing Customer Value Cross /Up Selling • What financial instruments are purchased together? • What products are transactions are executed by customers on a single visit to your bank or website? • What services are purchased sequentially?
Customer Value Analysis - Profitability/Loyalty • Predict a response to a marketing campaign • Predict which customers will leave for a competitor in the next six months • Classify loan applicants as high/med/low risk
Product and Sales Analysis • How do my sales compare by region, with respect to other stores and states. • How does my product perform by region, elevation and climate?