400 likes | 648 Views
Speakers: Prof Y V Hui, CityU Dr H P Lo, CityU Dr Sammy Yuen, CityU Dr K W Cheng, SAS Institute Mr Steven Parker, Standard Chartered . The Art and Science of Data Mining. Y V Hui City University of Hong Kong. The Driving Forces. Specialization and focus in business
E N D
Speakers: Prof Y V Hui, CityU Dr H P Lo, CityU Dr Sammy Yuen, CityU Dr K W Cheng, SAS Institute Mr Steven Parker, Standard Chartered Knowledge Discovery Centre: CityU-SAS Partnership
The Art and Science of Data Mining Y V Hui City University of Hong Kong Knowledge Discovery Centre: CityU-SAS Partnership
The Driving Forces • Specialization and focus in business - To satisfy the needs of customers - To improve and develop specific business strategies and processes - Personalization through mass customization Knowledge Discovery Centre: CityU-SAS Partnership
The Driving Forces • Challenges - local and global competition - distributed business operations - product innovation • Technology development • Benefit, cost and risk on a product or customer basis Knowledge Discovery Centre: CityU-SAS Partnership
Data Mining • Also known as knowledge discovery in databases. Data mining digs out valuable information from large and messy data. (Computer scientist’s definition) • Data mining is a knowledge discovery process. It’s the integration of business knowledge, people, information, statistics and computing technology. Knowledge Discovery Centre: CityU-SAS Partnership
Data Mining is Hot • Ten Hottest Job, Time, 22 May, 2000 • 10 emerging areas of technology, MIT’s Magazine of Technology Review, Jan/Feb, 2001 Knowledge Discovery Centre: CityU-SAS Partnership
Data Mining Philosophy • A powerful enabler of competitive advantage. • Data mining is driven from business knowledge. • Data mining is about enabling people to discover actionable information about their business. • Return of profit isn’t about algorithms Knowledge Discovery Centre: CityU-SAS Partnership
Scope of Data Mining Management’s Decision World Data Miner’s Analytical World Interface Business outlook Industry conditions Product offering Customer analysis Strategic options Competitive actions etc Problem development and management Reporting and evaluations Project design Data collection and preparation Model building Validation Knowledge Discovery Centre: CityU-SAS Partnership
Project Management • Cross-functional team • System architecture Knowledge Discovery Centre: CityU-SAS Partnership
Successful applications • Business transaction - risks and opportunities • Customer relationship management - personalization, target marketing • Electronic commerce & web - web mining Knowledge Discovery Centre: CityU-SAS Partnership
Successful applications • Science & engineering • Health care • Multi-media • Others Knowledge Discovery Centre: CityU-SAS Partnership
Data Mining Process Understanding of business Problem identification Knowledge Discovery Centre: CityU-SAS Partnership
Understanding Your Business • Do we have a problem? - What is the current situation? Are there any undesirable situations that need attention? - Are there any conditions, processes, etc, that could be improved? - Are any problems foreseeable that could affect the business? - Are there any potential opportunities that the company may capitalize on? A problem is a learning opportunity Knowledge Discovery Centre: CityU-SAS Partnership
Understanding Your Problem • Operational or analytical • Convention rule or knowledge discovery • Product based or customer based • Market research or data mining • Ownership of the information • Privacy • Added value Knowledge Discovery Centre: CityU-SAS Partnership
Data Mining Process Collecting relevant information Understanding of business Problem identification Knowledge Discovery Centre: CityU-SAS Partnership
Collecting Relevant Information • Data Search • Data Collection • Data Preparation • Data Mining Database Knowledge Discovery Centre: CityU-SAS Partnership
Data Search • Exploring the problem space. Don’t let the data drive the problem. • Measurement • Exploring the data sources Knowledge Discovery Centre: CityU-SAS Partnership
Data Collection • Data retrieval • Data audit • Data set assembly and data warehouse • Survey Knowledge Discovery Centre: CityU-SAS Partnership
Data Preparation • Data representation • Data exploration • Data normalization • Data transformation • Imputation of missing data • Data tuning Knowledge Discovery Centre: CityU-SAS Partnership
Data Mining Database • Variable selection • Record selection • Data set partition Knowledge Discovery Centre: CityU-SAS Partnership
Data Mining Process Learning Collecting relevant information Model building Understanding of business Problem identification Knowledge Discovery Centre: CityU-SAS Partnership
Model Building • Model based vs non-model based y1,y2,…,yp=f(x1, …, xq) Inputs Outputs y1, …, yp x1, …, xq Knowledge Discovery Centre: CityU-SAS Partnership
Model Building • Parametric vs nonparametric Knowledge Discovery Centre: CityU-SAS Partnership
Model Building • Estimation vs trial and error • Directed vs undirected • Multidimensional analysis • Large data set vs small data set Knowledge Discovery Centre: CityU-SAS Partnership
Data Mining Algorithms Online Analytical Processing Discovery Driven Methods Description Prediction SQL Query Tools Classification Regressions Visualization Decision Trees Clustering Neural Networks Association Sequential Analysis Knowledge Discovery Centre: CityU-SAS Partnership
Online Analytical Processing • Query and reporting Example of SQL query: How many credit-card customers who made purchases of over $1,000 on sporting goods in December have at least $20,000 of available credit? • Manual and validation driven Knowledge Discovery Centre: CityU-SAS Partnership
Estimation and Prediction • Statistical models • Neural network Example: Housing price valuation model Knowledge Discovery Centre: CityU-SAS Partnership
Classification Algorithms • Statistical techniques • Neural networks • Genetic algorithms • Nearest neighbor method • Rule induction and decision tree Example: Customer segmentation and buying behavior description Knowledge Discovery Centre: CityU-SAS Partnership
Association Rules • Apriori algorithm Example: Market basket analysis, cross selling analysis Knowledge Discovery Centre: CityU-SAS Partnership
Sequential Analysis • Count-all algorithm • Count-some algorithm Example: Attached mailing, add-on sales Knowledge Discovery Centre: CityU-SAS Partnership
Algorithms Comparison • No single data mining algorithm can outperform any other. Try different algorithms and draw conclusions from the results. Use your business knowledge. • Neural networks do no better than statistical models when the underlying structure is known. However, neural networks detect hidden interactions and nonlinearity. Use the prior information if available. Knowledge Discovery Centre: CityU-SAS Partnership
Algorithms Comparison • Data mining algorithms cannot handle dependent records. Use the prior information. Statistical models help. • Data tuning and dimension reduction enhance data mining before and after the analysis. Statistical techniques help. Knowledge Discovery Centre: CityU-SAS Partnership
Data Mining Process Learning Collecting relevant data Model building Understanding of business Problem identification Business strategy and evaluation Action Knowledge Discovery Centre: CityU-SAS Partnership
Trends that Effect Data Mining • Data trends - data explosion - data types Knowledge Discovery Centre: CityU-SAS Partnership
Trends that Effect Data Mining • Hardware trends - memory - processing speed - storage Knowledge Discovery Centre: CityU-SAS Partnership
Trends that Effect Data Mining • Network trends - network connectivity - distributed databases • Wireless communication Knowledge Discovery Centre: CityU-SAS Partnership
Trends that Effect Data Mining • Scientific computing trends - theory, experiment and simulation Knowledge Discovery Centre: CityU-SAS Partnership
Trends that Effect Data Mining • Business trends - total quality management, - customer relationship management, - business process reengineering, - enterprise resources planning, - supply chain management, - business intelligence and knowledge management, - e – business and m – business Knowledge Discovery Centre: CityU-SAS Partnership
Trends that Effect Data Mining • Privacy and Security Knowledge Discovery Centre: CityU-SAS Partnership
Pot of Gold • The benefits of knowing one’s business and customers become so critical that technologies are coming together to support data mining. • Data mining is not a cybernetic magic that will turn your data into gold. It’s the process and result of knowledge production, knowledge discovery and knowledge management. Knowledge Discovery Centre: CityU-SAS Partnership