150 likes | 600 Views
Welcome! MSCIT 521: Knowledge Discovery and Data Mining. Qiang Yang Hong Kong University of Science and Technology qyang@cs.ust.hk http://www.cs.ust.hk. KDDCUP from past years 2007: Predict if a user is going to rate a movie? Predict how many users are going to rate a movie? 2006:
E N D
Welcome!MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology qyang@cs.ust.hk http://www.cs.ust.hk Course Introduction
KDDCUP from past years 2007: Predict if a user is going to rate a movie? Predict how many users are going to rate a movie? 2006: Predict if a patient has cancer from medical images 2005: Given a web query (“Apple”), predict the categories (IT, Food) 1998: Given a person, predict if this person is going to donate money In general, we wish to Input: Data Output: Build model Apply model to future data Data Mining: An Example Course Introduction 2
Data Mining: Convergence of Three Technologies Course Introduction 3
Definition: Predictive Model • A “black box” that makes predictions about the future based on information from the past and present • Large number of inputs usually available Course Introduction 4
How are Models Built and Used? • High Level View: Course Introduction 5
The Data Mining Process Course Introduction 6
What does the Real World Look Like Course Introduction 7
Predictive Models are… • Decision Trees • Nearest Neighbor Classification • Neural Networks • Rule Induction • Clustering Course Introduction 8
Course Description • Data Mining and Knowledge Discovery • Focus: • Focus 1: Theoretical foundations in Pattern Recognition and Machine Learning • Algorithms: • Differences? • where they apply? • Focus 2: Broad survey of recent research • Focus 3: Hands-on, apply algorithms to KDD data sets Course Introduction
Topic 1: Foundations • Classification algorithms • Clustering algorithms • Association algorithms • Sequential Data Mining • Novel Applications • Web • Customer Relationship Management • Biological Data Course Introduction
Topic 2: Hands On • Apply learned algorithms to selected data sets • Homework assignments • Get familiar with existing system packages and libraries • In-class workshops • Programming Assignments Course Introduction
Important Sites • Instructor Web Site • http://www.cse.ust.hk/~qyang/521 • TA: Kaixiang Mo • Assignment Hand-in: online • csit5210@ust.hk • Course Discussion Site: • Check out the web cite Course Introduction
Prerequisites • Statistics and Probability would help, • but not necessary • Pattern Recognition would help, • but not necessary • Databases • Knowledge of SQL and relational algebra • But not necessary • One programming language • One of Java, C++, Perl, Matlab, etc. • Will need to read Java Library Course Introduction
Grading • Grade Distribution: • Assignments 20% • Course Project 20% • Exams 60% • Midterm 20% • Final 40% Course Introduction
More info • Textbooks: For reference only • Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Pearson International Edition, 2005. • Data Mining. by Ian Witten and Ebe Frank. (Google books) • Data Mining -- Concepts and Techniques by Jiawei Han and Micheline Kamber. Morgan Kaufmann Publishers. • Available in our bookstore Course Introduction