1 / 19

Welcome! Knowledge Discovery and Data Mining

Welcome! Knowledge Discovery and Data Mining. Qiang Yang Hong Kong University of Science and Technology qyang@cs.ust.hk http://www.cs.ust.hk. Data Mining: An Example. You are a marketing manager for a brokerage company Problem: Churn is too high (also known as Attrition)

jarrod-chen
Download Presentation

Welcome! Knowledge Discovery and Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome!Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology qyang@cs.ust.hk http://www.cs.ust.hk Course Introduction

  2. Data Mining: An Example • You are a marketing manager for a brokerage company • Problem: Churn is too high (also known as Attrition) • Turnover (after six month introductory period ends) is 40% • Customers receive incentives (average cost: $160) when account is opened • Giving new incentives to everyone who might leave is very expensive (as well as wasteful) • Bringing back a customer after they leave is both difficult and costly Course Introduction 2

  3. …A Solution • One month before the end of the introductory period is over, predict which customers will leave • If you want to keep a customer that is predicted to churn, offer them something based on their predicted value • The ones that are not predicted to churn need no attention • If you don’t want to keep the customer, do nothing • How can you predict future behavior? • Build models • Test models Course Introduction 3

  4. Convergence of Three Technologies Course Introduction 4

  5. Why Now? 1. Increasing Computing Power • Moore’s law doubles computing power every 18 months • Powerful workstations became common • Cost effective servers (SMPs) provide parallel processing to the mass market Course Introduction 5

  6. 2. Improved Data Collection • Data Collection  Access  Navigation  Mining • The more data the better (usually) Course Introduction 6

  7. 3. Improved Algorithms (AI + Data Base) • Techniques have often been waiting for computing technology to catch up • Statisticians already doing “manual data mining” • Good machine learning = intelligent application of statistical processes • A lot of data mining research focused on tweaking existing techniques to get small percentage gains Course Introduction 7

  8. Definition: Predictive Model • A “black box” that makes predictions about the future based on information from the past and present • Large number of inputs usually available Course Introduction 8

  9. How are Models Built and Used? • View from 20,000 feet: Course Introduction 9

  10. The Data Mining Process Course Introduction 10

  11. What the Real World Looks Like Course Introduction 11

  12. Predictive Models are… • Decision Trees • Nearest Neighbor Classification • Neural Networks • Rule Induction • K-means Clustering Course Introduction 12

  13. Data Mining is Not ... • Data warehousing • SQL / Ad Hoc Queries / Reporting • Software Agents • Online Analytical Processing (OLAP) • Data Visualization Course Introduction 13

  14. Common Uses of Data Mining • Marketing: • Direct mail marketing • Web site personalization • Fraud Detection • Credit card fraud detection • Science • Bioinformatics • Gene analysis • Web & Text analysis • Google Course Introduction 14

  15. Course Description • Data Mining and Knowledge Discovery • Focus: • Focus 1: Theoretical foundations in Pattern Recognition and Machine Learning • Algorithms: • Differences? • where they apply? • Focus 2: Broad survey of recent research • Focus 3: Hands-on, apply algorithms to KDD data sets Course Introduction

  16. Topic 1: Foundations • Classification algorithms • Clustering algorithms • Association algorithms • Sequential Data Mining • Novel Applications • Web • Customer Relationship Management • Biological Data Course Introduction

  17. Topic 2: Hands On • Apply learned algorithms to selected data sets • Get familiar with existing software packages and libraries • Final Project will involve working with some datasets Course Introduction

  18. Prerequisites • Statistics and Probability would help, • but not necessary • Pattern Recognition would help, • but not necessary • Databases • Knowledge of SQL and relational algebra • But not necessary • One programming language • One of Java, C++, Perl, Matlab, etc. • Will need to read Java Library Course Introduction

  19. Grading • Grade Distribution: • Assignments (30%) • Midterm Exam: 30% • Paper Presentation and Presentation: 40% Course Introduction

More Related