230 likes | 404 Views
Data Mining. Week 9 Introduction to Data Mining. Competitive Advantage. Performance. Better Understanding. Good Business Decision. Data Mining. Data Warehouse. External Source. MySQL. ERD. Defining User Communities. Information user
E N D
Data Mining Week 9 Introduction to Data Mining Fox MIS Spring 2011
Competitive Advantage Performance Better Understanding Good Business Decision Data Mining Data Warehouse External Source MySQL ERD
Defining User Communities • Information user • Generally requires standard reports and that often includes charts and tables • Wants to scan consistently structured reports without needing slice or dice to find the desired values • Static or simple interactive reports • Information consumer • Requires the ability to dynamically query the database, without becoming an expert at database design or the query tool • Ad-hoc multidimensional analysis • Many business people cross the line between information users and information consumers • Power analyst • Require the full analytical power of the data mart in order to perform free-form ad hoc analysis
Some Questions Analysts Need to Answers • Sales analysis: • What are the sales by quarter and geography? • How do sales compare in two different stores in the same state? • Profitability analysis: • Which is the most profitable store in the state CA? • Which product lines are the highest revenue producers this year? • Which products and product lines are the most profitable this quarter? • Sale force analysis • Which salesperson is the best revenue producer this year?Do salesperson X meet his sale target this quarter?
Finding a Pattern from Data • Tenure and sick days by department • Average tenure for each department: 9.0 • Average number of sick days is 7.5 for each
Data Mining • The application of specific algorithms for extracting patterns from data • Data mining tools automatically search data for patterns and relationships • Data mining tools • Analyze data • Uncover problems or opportunities • Form computer models based on findings • Predict business behavior with models • Require minimal end-user intervention
Data Mining • Goal • Simplification and automation of the overall statistical process, from data source(s) to model application • Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature: • Massive data collection • Powerful multiprocessor computers • Data mining algorithms
Data Mining and Knowledge Discovery in the Real World • Marketing • If customer bought X, he/she is also likely to buy Y and Z • Investment • Stock investment • Fraud detection • Identify financial transactions that might indicate money-laundering activity
A Problem... • You are a marketing manager for a brokerage company • Problem: Churn is too high • Turnover (after six month introductory period ends) is 40% • Customers receive incentives (average cost: $160) when account is opened • Giving new incentives to everyone who might leave is very expensive (as well as wasteful) • Bringing back a customer after they leave is both difficult and costly
… A Solution • One month before the end of the introductory period is over, predict which customers will leave • If you want to keep a customer that is predicted to churn, offer them something based on their predicted value • The ones that are not predicted to churn need no attention
Benefit of Data Mining • New business opportunities by providing these capabilities: • Automated prediction of trends and behaviors • Targeted marketing. • Promotional mailings to identify the targets most likely to maximize return on investment in future mailings. • Forecasting bankruptcy and other forms of default • Automated discovery of previously unknown patterns. • Data mining tools sweep through databases and identify previously hidden patterns in one step • Analysis of retail sales data to identify seemingly unrelated products that are often purchased together
Descriptive Data Mining • Descriptive Data Mining • Seeks to describe new patterns in the data and requires human interaction to determine the significance and meaning of these patterns • Affinity grouping • Which item goes together • Clustering • Divides data into smaller groups based on similarity without predefinition of the groups • Customers with similar buying habits • Visualization • Graphical representation of data
Predictive Data Mining • Likelihood of a particular outcome • Mathematical algorithms are used to create models • Classification • A new record is assigned to a specific category defined by the model • New credit applicants as low risk, medium risk, or high risk • Estimation • Assign a new record with a predicted value • Length of time a customer will stay
Defining Data Mining • The automated extraction of predictive information from (large) databases • Two key words: • Automated • Predictive • Data mining lets you be proactive • Prospective rather than Retrospective
How Data Mining Works: Modeling • Modeling is simply the act of building a model in one situation where you know the answer and then applying it to another situation that you don't. • Some models are better than others • Accuracy • Understandability • Models range from “easy to understand” to incomprehensible • Decision trees • Rule induction • Regression models • Neural Networks
Techniques in Data Ming • Decision Trees • Nearest Neighbor Classification • Neural Networks • Rule Induction • K-means Clustering