1 / 15

Overview of SQL Server Data Mining

Overview of SQL Server Data Mining. CSD305 Advanced D atabases. Data Mining Definition. Mining Act of excavation in the earth from which ore or minerals can be extracted Data Mining Act of excavation in the data from which patterns can be extracted

rymer
Download Presentation

Overview of SQL Server Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of SQL Server Data Mining CSD305 Advanced Databases

  2. Data Mining Definition • Mining • Act of excavation in the earth from which ore or minerals can be extracted • Data Mining • Act of excavation in the data from which patterns can be extracted • Alternative name: Knowledge discovery in databases (KDD) • Multiple disciplines: database, statistics, artificial intelligence • Fast maturing technology • Unlimited applicability CSD305 Advanced Databases

  3. Train the model Training Data Test the model Prediction using the model Test Data Prediction Input Data Data Mining Process Define a model Data Mining Management System (DMMS) CSD305 Advanced Databases Mining Model

  4. Data Mining Tasks CSD305 Advanced Databases

  5. Data Mining ProblemsClassification (prediction) • Is this student going to go to a college? • Based on Gender, ParentIncome, ParentEncouragement, IQ, etc. • E.g., if ParentEncouragement=Yes and IQ>100, College=Yes • Classification (prediction) • Similar questions: • Is this a spam email? (spam filtering) • How good/bad is your credit? (credit scoring) • Recognition of hand-written letters (pen recognition) • What is this gene like? (bioinformatics) • Does this person behave like a terrorist? (TIA) CSD305 Advanced Databases

  6. Decision Tree Attend College 55% Yes 45% No IQ <= 100 IQ > 100 Attend College 79% Yes 21% No Attend College 35% Yes 65% No Encouragement = Encouraged Encouragement = Not Encouraged CSD305 Advanced Databases Attend College 94% Yes 6% No Attend College 69% Yes 31% No

  7. Data Mining Problems Regression (prediction) • What is the age of a person? • Based on Hobby, MaritalStatus, NumberOfChildren, Income, HouseOwnership, NumberOfCars, … • E.g., If MaritalStatus=Yes, Age = 20+4*NumberOfChildren+0.0001*Income+…  Regression (prediction) • Similar questions: • What’s the sales amount of ice cream next month? (sales prediction) • What’s the stock price of MSFT next week? (stock prediction) • What’s the income of a customer? (marketing) • What’s the life-time of a software bug? (bug tracking) CSD305 Advanced Databases

  8. Data Mining Problems Segmentation (Clustering) • Who are my Web visitors? • Identify similar groups based on demographics, visiting patterns • E.g., Daily news readers, email users, shoppers, short-stayers, etc • Segmentation (clustering) • Similar questions: • Identify groups of genes (bioinformatics) • Identify groups of locations of Cholera incidents in London (spatial data mining) • Identify group of customers in merchants (Amazon, E-Bay, MSN, WalMartetc) (target marketing) • Identify groups of documents. (text categorisation) CSD305 Advanced Databases

  9. Data Mining Problems Association Analysis (recommendation, market analysis) • What other products are purchased together with a digital camera? • Based on previous purchases (shopping cart) • E.g., If a digital camera is purchased, flash memory, battery, printer are also purchased. • Association Analysis (recommendation, market basket analysis) • Similar questions: • What products to recommend in on-line stores such as Amazon.com. • What items should be displayed together in merchant. • What genes appear together in toxic mushrooms. CSD305 Advanced Databases

  10. Data Mining Problems Anomaly detection (outlier detection) • Could this network packet be from a virus attack? • Predict likelihood of the network packet pattern • Anomaly detection (outlier detection) • Similar questions: • Are the hospital lab results normal (Adverse drug effect detection) • Is this credit transaction fraudulent? (fraud detection) • Does this person behave unusually, maybe worth high-level of security clearance? (TIA) CSD305 Advanced Databases

  11. Data Mining Tasks - Summary • Classification • Regression • Segmentation • Association Analysis • Anomaly detection • Sequence Analysis • Time-series Analysis • Text categorization • Others CSD305 Advanced Databases

  12. Data Mining Algorithms • Decision Trees • Naïve Bayesian • Clustering • Sequence Clustering • Association Rules • Neural Network • Time Series • Support Vector Machines • …. CSD305 Advanced Databases

  13. Data Mining Algorithms Association rules Seq. Clustering Neural Network Decision Trees Naïve Bayes Time Series Clustering Classification Regression Segmentaion Assoc. Analysis Anomaly Detect. Seq. Analysis Time series CSD305 Advanced Databases √ - first choice √ - second choice

  14. Data Mining Vendors • SAS (Analytics) • http://www.sas.com/technologies/analytics/index.html • IBM (DB2 InfoSphere Warehouse) • http://www-01.ibm.com/software/data/infosphere/warehouse/mining.html • Oracle (ODM option to Oracle 11g) • http://www.oracle.com/technetwork/database/options/odm/index.html • SPSS (Clementine) • Insightsful (Insightful Miner) • KXEN (Analytic Framework) • Prudsys (Discoverer and its family) • Microsoft (SQL Server 2012) • Angoss (KnowledgeServer and its family) • DBMiner (DBMiner) • … and many others CSD305 Advanced Databases

  15. References • Pang-Ning Tan, Michael Steinbach, Vipin Kumar “Introduction to Data Mining” Pearson Education, 2006  • T. Marakas “Modern Data Warehousing, Mining and Visualization: Core Concepts” Prentice Hall 2003 • http://msdn.microsoft.com/en-us/library/ms132058.aspx • Data mining extensions to SQL server • http://troels.arvin.dk/db/rdbms/ • Comparison of different SQL implementations • Also see the wikibook here http://en.wikibooks.org/wiki/SQL_dialects_reference CSD305 Advanced Databases

More Related