190 likes | 274 Views
CS499/699-10 Data Mining. Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU. Introduction. Introduction to this Course Introduction to Data Mining. Introduction to the Course. First, about you - why take this course? Your background and strength
E N D
CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU Data Mining – Introduction G Dong (WSU)
Introduction • Introduction to this Course • Introduction to Data Mining Data Mining – Introduction Guozhu Dong
Introduction to the Course • First, about you - why take this course? • Your background and strength • AI, DBMS, Statistics, Biology, Business, … • Your interests and requests • What is this course about? • Problem solving • Handling data • transform data to workable data • Mining data • turn data to knowledge • validation and presentation of knowledge Data Mining – Introduction Guozhu Dong
This course • What can you expect from this course? • Knowledge and experience about DM • Problem solving skills • How is this course conducted? • Home works, projects, exams, classes • Course Format • Individual Projects: 30% • Exams and/or quizzes: 60% • Homeworks: 10% Data Mining – Introduction Guozhu Dong
Course Web Site • cs.wright.edu/~gdong/mining03/WSUCS499DataMining.htm • My office and office hours • RC 430 • 4:30-5:30, T Th • My email: gdong@cs.wright.edu • Slides and relevant information will be made available at the course web site Data Mining – Introduction Guozhu Dong
Any questions and suggestions? • Your feedback is most welcome! • I need it to adapt the course to your needs. • Please feel free to provide yours anytime. • Share your questions and concerns with the class – very likely others may have the same. • No pain no gain – no magic for data mining. • The more you put in, the more you get • Your grades are proportional to your efforts. Data Mining – Introduction Guozhu Dong
Introduction to Data Mining Definitions Motivations of DM Interdisciplinary Links of DM Data Mining – Introduction G Dong (WSU)
What is DM? • Or more precisely KDD (knowledge discovery from databases)? • Many definitions • An iterative process, not plug-and-play raw data transformed data preprocessed data data mining post-processing knowledge • One definition is • A non-trivial process of identifying valid, novel, useful and ultimately understandable patterns in data Data Mining – Introduction Guozhu Dong
Need for Data Mining • Data accumulate and double every 9 months • There is a big gap from stored data to knowledge; and the transition won’t occur automatically. • Manual data analysis is not new but a bottleneck • Fast developing Computer Science and Engineering generates new demands • Seeking knowledge from massive data • Any personal experience? Data Mining – Introduction Guozhu Dong
When is DM useful • Data rich world • Large data (dimensionality and size) • Image data (size) • Gene chip data (dimensionality) • Little knowledge about data (exploratory data analysis) • What if we have some knowledge? Data Mining – Introduction Guozhu Dong
DM perspectives • KDD “goals”: Prediction, description, explanation, optimization, and exploration • Knowledge forms: patterns vs. models • Understandability and representation of knowledge • Some applications • Business intelligence (CRM) • Security (Info, Comp Systems, Networks, Data, Privacy) • Scientific discovery (bioinformatics, medicine) Data Mining – Introduction Guozhu Dong
Challenges • Increasing data dimensionality and data size • Various data forms • New data types • Streaming data, multimedia data • Efficient search and access to data/knowledge • Intelligent update and integration Data Mining – Introduction Guozhu Dong
Interdisciplinary Links of DM • Statistics • Databases • AI • Machine Learning • Visualization • High Performance Computing • supercomputers, distributed/parallel/cluster computing Data Mining – Introduction Guozhu Dong
Statistics • Discovery of structures or patterns in data sets • hypothesis testing, parameter estimation • Optimal strategies for collecting data • efficient search of large databases • Static data • constantly evolving data • Models play a central role • algorithms are of a major concern • patterns are sought Data Mining – Introduction Guozhu Dong
Relational Databases • A relational database can contain several tables • Tables and schemas • The goal in data organization is to maintain data and quickly locate the requested data • Queries and index structures • Query execution and optimization • Query optimization is to find the “best” possible evaluation method for a given query • Providing fast, reliable access to data for data mining Data Mining – Introduction Guozhu Dong
AI • Intelligent agents • Perception-Action-Goal-Environment • Search • Uniform cost and informed search algorithms • Knowledge representation • FOL, production rules, frames with semantic networks • Knowledge acquisition • Knowledge maintenance and application Data Mining – Introduction Guozhu Dong
Machine Learning • Focusing on complex representations, data-intensive problems, and search-based methods • Flexibility with prior knowledge and collected data • Generalization from data and empirical validation • statistical soundness and computational efficiency • constrained by finite computing & data resources • Challenges from KDD • scaling up, cost info, auto data preprocessing, more knowledge types Data Mining – Introduction Guozhu Dong
Visualization • Producing a visual display with insights into the structure of the data with interactive means • zoom in/out, rotating, displaying detailed info • Various types of visualization methods • show summary properties and explore relationships between variables • investigate large DBs and convey lots of information • analyze data with geographic/spatial location • A pre- and post-processing tool for KDD Data Mining – Introduction Guozhu Dong
Bibliography • J. Han and M. Kamber. Data Mining – Concepts and Techniques. 2001. Morgan Kaufmann. • D. Hand, H. Mannila, P. Smyth. Principals of Data Mining. 2001. MIT. • W. Klosgen & J.M. Zytkow, edited, 2001, Handbook of Data Mining and Knowledge Discovery. Data Mining – Introduction Guozhu Dong