420 likes | 599 Views
Data Mining The Art and Science of Obtaining Knowledge from Data. Dr. Saed Sayad. Agenda. Explosion of data Introduction to data mining Examples of data mining in science and engineering Challenges and opportunities. Explosion of Data. Data in the world doubles every 20 months!
E N D
Data MiningThe Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad
Agenda • Explosion of data • Introduction to data mining • Examples of data mining in science and engineering • Challenges and opportunities
Explosion of Data • Data in the world doubles every 20 months! • NASA’s Earth Orbiting System: • 46 megabytes of data per second • 4,000,000,000,000 bytes a day • FBI fingerprints image library: • 200,000,000,000,000 bytes • In-line image analysis for particle detection: • 1 megabyte in one second
What we need? Fast, accurate, and scalable data analysis techniques to extract useful knowledge: The answer is Data Mining.
Data Mining Data Knowledge What is Data Mining? “Data Mining is the exploration and analysis of large or small quantities of data in order to discover meaningful patterns, trends and rules.”
Data Analysis Data Warehouse OLAP AI, Machine Learning Statistics Data Mining Database
Data Mining Data Analysis Database Statistics Machine Learning Data Warehouse OLAP
Data Analysis • Classification • Regression • Clustering • Association • Sequence Analysis
Data Analysis W1 Model Numeric X1 Y1 Numeric Regression (0,1) age, income, … W2 Categorical X2 Y2 Categorical Classification (good, bad) gender, occupation, … Input Variables or Attributes Linear Models or Decision Trees Output Variables or Targets
Data Analysis (cont.) Clustering Association Income 1, chips, coke, chocolate 2, gum, chips 3, chips, coke 4, … Probability (chips, coke) ? Age Sequence Analysis …ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA… Xt-1 Xt T
Data Mining in Research Life Cycle • Questions • Needs Report Library Search Modeling Data Analysis Database Research Data Experiment
Data Mining – Modeling Steps • Problem Definition • Data Preparation • Exploration • Modeling • Evaluation • Deployment
Agenda • Explosion of data • Introduction to data mining • Examples of data mining in science and engineering • Challenges and opportunities
Examples of data mining in science & engineering • 1. Data mining in Biomedical Engineering • “Robotic Arm Control Using Data Mining Techniques” • 2. Data mining in Chemical Engineering • “Data Mining for In-line Image Monitoring of Extrusion Processing”
Supination Pronation Flexion Extension 1. Problem Definition “Control a robotic arm by means of EMG signals from biceps and triceps muscles.”
2. Data Preparation • The dataset includes 80 records. • There are two input variables; biceps signal and triceps signal. • One output variable, with four possible values; Supination, Pronation, Flexion and Extension.
Scatter Plot Triceps Record# FlexionExtensionSupinationPronation 3. Exploration
Scatter Plot Biceps Record# FlexionExtensionSupinationPronation 3. Exploration (cont.)
5. Modeling • Classification • OneR • Decision Tree • Naïve Bayesian • K-Nearest Neighbors • Neural Networks • Linear Discriminant Analysis • Support Vector Machines • …
6. Model Deployment A neural network model was successfully implemented inside the robotic arm.
Examples of data mining in science & engineering 1. Data mining in Biomedical Engineering “Robotic Arm Control Using Data Mining Techniques” 2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing”
Plastics Extrusion Plastic pellets Plastic melt
Film Extrusion Defect due to particle contaminant Extruder Plastic Film
Transition Piece In-Line Monitoring Window Ports
In-Line Monitoring Optical Assembly Light Extruder and Interface Light Source Imaging Computer
1. Problem Definition Classify images into those with particles (WP) and those without particles (WO). WO WP
2. Data Preparation • 2000 Images • 54 Input variables all numeric • One output variables with two possible values • With Particle • Without Particle
2. Data Preparation (cont.) • Pre-processed images to remove noise • Dataset 1 with sharp images: 1350 images including 1257 without particles and 91 with particles • Dataset 2 with sharp and blurry images: 2000 images including 1909 without particles and blurry particles and 91 with particles • 54 Input variables, all numeric • One output variable, with two possible values (WP and WO)
3. Exploration Demo!
4. Modeling • Classification: • OneR • Decision Tree • 3-Nearest Neighbors • Naïve Bayesian
5. Evaluation 10 -fold cross-validation If pixel_density_max < 142 then WP
6. Deploy model • A Visual Basic program will be developed to implement the model.
Agenda • Explosion of data • Introduction to data mining • Examples of data mining in science & engineering • Challenges and opportunities
Challenges and Opportunities • Data mining is a ‘top ten’ emerging technology. • High pay job! in the financial, medical and engineering. • Faster, more accurate and more scalable techniques. • Incremental, on-line and real-time learning algorithms. • Parallel and distributed data processing techniques.
Data mining is an exciting and challenging field with the ability to solve many complex scientific and business problems. Youcan be part of the solution!