1 / 42

Data Mining The Art and Science of Obtaining Knowledge from Data

Data Mining The Art and Science of Obtaining Knowledge from Data. Dr. Saed Sayad. Agenda. Explosion of data Introduction to data mining Examples of data mining in science and engineering Challenges and opportunities. Explosion of Data. Data in the world doubles every 20 months!

jola
Download Presentation

Data Mining The Art and Science of Obtaining Knowledge from Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data MiningThe Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad

  2. Agenda • Explosion of data • Introduction to data mining • Examples of data mining in science and engineering • Challenges and opportunities

  3. Explosion of Data • Data in the world doubles every 20 months! • NASA’s Earth Orbiting System: • 46 megabytes of data per second • 4,000,000,000,000 bytes a day • FBI fingerprints image library: • 200,000,000,000,000 bytes • In-line image analysis for particle detection: • 1 megabyte in one second

  4. Explosion of Data (cont.)

  5. Explosion of Data (cont.)

  6. Explosion of Data (cont.)

  7. Explosion of Data (cont.)

  8. What we need? Fast, accurate, and scalable data analysis techniques to extract useful knowledge: The answer is Data Mining.

  9. Data Mining Data Knowledge What is Data Mining? “Data Mining is the exploration and analysis of large or small quantities of data in order to discover meaningful patterns, trends and rules.”

  10. Data Analysis Data Warehouse OLAP AI, Machine Learning Statistics Data Mining Database

  11. Data Mining Data Analysis Database Statistics Machine Learning Data Warehouse OLAP

  12. Database

  13. Data Analysis • Classification • Regression • Clustering • Association • Sequence Analysis

  14. Data Analysis W1 Model Numeric X1 Y1 Numeric Regression (0,1) age, income, … W2 Categorical X2 Y2 Categorical Classification (good, bad) gender, occupation, … Input Variables or Attributes Linear Models or Decision Trees Output Variables or Targets

  15. Data Analysis (cont.) Clustering Association Income 1, chips, coke, chocolate 2, gum, chips 3, chips, coke 4, … Probability (chips, coke) ? Age Sequence Analysis …ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA… Xt-1 Xt T

  16. Data Mining in Research Life Cycle • Questions • Needs Report Library Search Modeling Data Analysis Database Research Data Experiment

  17. Data Mining – Modeling Steps • Problem Definition • Data Preparation • Exploration • Modeling • Evaluation • Deployment

  18. Agenda • Explosion of data • Introduction to data mining • Examples of data mining in science and engineering • Challenges and opportunities

  19. Examples of data mining in science & engineering • 1. Data mining in Biomedical Engineering • “Robotic Arm Control Using Data Mining Techniques” • 2. Data mining in Chemical Engineering • “Data Mining for In-line Image Monitoring of Extrusion Processing”

  20. Supination Pronation Flexion Extension 1. Problem Definition “Control a robotic arm by means of EMG signals from biceps and triceps muscles.”

  21. 2. Data Preparation • The dataset includes 80 records. • There are two input variables; biceps signal and triceps signal. • One output variable, with four possible values; Supination, Pronation, Flexion and Extension.

  22. Scatter Plot Triceps Record# FlexionExtensionSupinationPronation 3. Exploration

  23. Scatter Plot Biceps Record# FlexionExtensionSupinationPronation 3. Exploration (cont.)

  24. 5. Modeling • Classification • OneR • Decision Tree • Naïve Bayesian • K-Nearest Neighbors • Neural Networks • Linear Discriminant Analysis • Support Vector Machines • …

  25. 6. Model Deployment A neural network model was successfully implemented inside the robotic arm.

  26. Examples of data mining in science & engineering 1. Data mining in Biomedical Engineering “Robotic Arm Control Using Data Mining Techniques” 2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing”

  27. Plastics Extrusion Plastic pellets Plastic melt

  28. Film Extrusion Defect due to particle contaminant Extruder Plastic Film

  29. Transition Piece In-Line Monitoring Window Ports

  30. In-Line Monitoring Optical Assembly Light Extruder and Interface Light Source Imaging Computer

  31. Melt Without Contaminant Particles (WO)

  32. Melt With Contaminant Particles (WP)

  33. 1. Problem Definition Classify images into those with particles (WP) and those without particles (WO). WO WP

  34. 2. Data Preparation • 2000 Images • 54 Input variables all numeric • One output variables with two possible values • With Particle • Without Particle

  35. 2. Data Preparation (cont.) • Pre-processed images to remove noise • Dataset 1 with sharp images: 1350 images including 1257 without particles and 91 with particles • Dataset 2 with sharp and blurry images: 2000 images including 1909 without particles and blurry particles and 91 with particles • 54 Input variables, all numeric • One output variable, with two possible values (WP and WO)

  36. 3. Exploration Demo!

  37. 4. Modeling • Classification: • OneR • Decision Tree • 3-Nearest Neighbors • Naïve Bayesian

  38. 5. Evaluation 10 -fold cross-validation If pixel_density_max < 142 then WP

  39. 6. Deploy model • A Visual Basic program will be developed to implement the model.

  40. Agenda • Explosion of data • Introduction to data mining • Examples of data mining in science & engineering • Challenges and opportunities

  41. Challenges and Opportunities • Data mining is a ‘top ten’ emerging technology. • High pay job! in the financial, medical and engineering. • Faster, more accurate and more scalable techniques. • Incremental, on-line and real-time learning algorithms. • Parallel and distributed data processing techniques.

  42. Data mining is an exciting and challenging field with the ability to solve many complex scientific and business problems. Youcan be part of the solution!

More Related